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HUMAN GENOME- DERIVED SINGLE EXON NUCLEIC ACID PROBES USEFUL 
FOR ANALYSIS OF GENE EXPRESSION IN HUMAN BONE MARROW 



CROSS REFERENCE TO RELATED APPLICATIONS 

5 

The present application is a continuation-in-part of U.S. 
patent application serial nos . 09/632,366, filed August 3, 
2000 and 09/608,408, filed June 30, 2000; claims the 
benefit under 35 U.S.C. s 119(e) of U . S . provisional patent 

10 application serial nos. 60/236,359, filed September 27, 
2000, 60/234,687, filed September 21, 2000, 60/207,456, 
filed May 26, 2000, and 60/180,312, filed February 4, 2000; 
and further claims the benefit under 35 U.S.C. s 119(a) of 
UK patent application no. 0024263.6, filed October 4, 2000, 

15 the disclosures of which are incorporated herein by 
reference in their entireties. 

REFERENCE TO SEQUENCE LISTING AND INCORPORATION BY 
REFERENCE THEREOF 

20 

The present application includes a .Sequence Listing in 
electronic format, filed pursuant to PCT Administrative 
Instructions 801 - 806 on a single CD-R disc, in 
triplicate, containing a file named pto_BONE_MARROW.txt, 
25 created 24 January 2001, having 26, .421, 347 bytes. The 
Sequence Listing contained in said file on said disc is 
incorporated herein by reference in its entirety. 

Field of the Invention 

30 

The present invention relates to genome-derived 
single exon microarrays useful for verifying the expression 
of regions of genomic DNA predicted to encode protein. In 
particular, the present invention relates to unique genome- 
35 derived single exon nucleic acid probes expressed in human 

1 
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bone marrow and single exon nucleic acid microarrays that 

include such probes. 



Background of the Invention 
5 For almost two decades following the invention of 

general techniques for nucleic acid sequencing, Sanger et 
al. r Proc. Natl. Acad. Sci. USA 70 (4 ); 1209-13 (1973); 
Gilbert et al. f Proc. Natl. Acad. Sci. USA 70 ( 12 ): 3581-4 
(1973), these techniques were, used principally as tools to 

10 further the understanding of proteins - known or 

suspected — about which a basic foundation of biological 
knowledge had already been built. In many cases, the 
cloning effort that preceded sequence identification had 
been both informed and directed by that antecedent 

15 biological understanding. 

For example, the cloning of the T cell receptor 
for antigen was predicated upon its known or suspected cell 
type-specific expression, by its suspected membrane 
association, and by the predicted assembly of its gene via 

20 T cell-specific somatic recombination. Subsequent 
sequencing efforts at once confirmed and extended 
understanding of this family of proteins. Hedrick et al., 
Nature 308 (5955) : 153-8 (1984). 

More recently, however, the development of high 

25 throughput sequencing methods and devices, in concert with 
large public and private undertakings to sequence the human 
and other genomes, has altered this investigational 
paradigm: today, sequence information often precedes 
understanding of the basic biology of the encoded protein 

30 product. 

One of the approaches to large-scale sequencing 
is predicated upon the proposition that expressed 
sequences - that is, those accessible through isolation of 
mRNA — are of greatest initial interest. This "expressed 
35 sequence tag" ("EST") approach has already yielded vast 
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amounts of sequence data (see for example Adams et al. f 
Science 252:1651 (1991); Williamson, Drug Discov. Today 
4:115 (1999)). For nucleic acids sequenced by this 
approach, often the only biological information that is 
5 known a priori with any certainty is the likelihood of 
biologic expression itself. By virtue of the species and 
tissue from which the mRNA had originally been obtained, 
most such sequences are also annotated with the identity of 
the species and at least one tissue in which expression 

10 appears likely. 

More recently, the pace of genomic sequencing has 
accelerated dramatically. When genomic DNA serves as the 
initial substrate for sequencing efforts, expression cannot 
be presumed; often the only a priori biological information 

15 about the sequence includes the species and chromosome (and 
perhaps chromosomal map location) of origin. 

With the ever-accelerating pace of sequence 
accumulation by directed, EST, and genomic sequencing 
approaches — and in particular, with the accumulation of 

20 sequence information from multiple genera, from multiple 

species within genera, and from multiple individuals within 
a species — there is an increasing need for methods that 
rapidly and effectively permit the functions of nucleic 
sequences to be elucidated. And as such functional 

25 information accumulates, there is a further need for 
methods of storing such functional information in 
meaningful and useful relationship to the sequence itself; 
that is, there is an increasing need for means and 
apparatus for annotating raw sequence data with known or 

30 predicted functional information. 

Although the increase in the pace of genomic 
sequencing is due in large part to technological changes in 
sequencing strategies and instrumentation, Service, Science 
280:995 (1998); Pennisi, Science 283: 1822-1823 (1999), 

35 there is an important functional motivation as well. 
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While it was understood that the EST approach 
would rarely be able to yield sequence information about 
the noncoding portions of the genome, it now also appears 
the EST approach is capable of capturing only a fraction of 
5 a genome's actual expression complexity. 

For example, when the C. elegans genome was fully 
sequenced, gene prediction algorithms identified over 
19,000 potential genes, of which only 7,000 had been found 
by EST sequencing. C. elegans Sequencing Consortium, 

10 Science 282:2012 (1998). Analogously, the recently 

completed sequence of chromosome 2 of Arabidopsis predicts 
over 4000 genes, Lin et al., Nature, 402:761 (1999), of 
which only about 6% had previously been identified via EST 
sequencing efforts. Although the human genome has the 

15 greatest depth of EST coverage, it is still woefully short 
of surrendering all of its genes. One recent estimate 
suggests that the human genome contains more than 146,000 
genes,, which would at this point leave greater than half of 
the genes undiscovered. It is now predicted that many 

20 genes, perhaps 20 to 50%, will only be found by genomic 
sequencing. 

There is, therefore, a need for methods that 
permit the functional regions of genomic sequence — and 
most importantly, but not exclusively, regions that 

25 function to encode genes — to be identified. 

Much of the coding sequence of the human genome 
is not homologous to known genes, making detection of open 
reading frames ("ORFs") and predictions of gene function 
difficult. Computational methods exist for predicting 

30 coding regions in eukaryotic genomes. Gene prediction 
programs such as GRAIL and GRAIL II, Uberbacher et al. r 
Proc. Natl. Acad, Sci . USA 88 (24 ): 11261-5 (1991); Xu et 
al., Genet. Eng. 16:241-53 (1994); Uberbacher et al., 
Methods Enzymol. 266:259-81 (1996); GENEFINDER, Solovyev et 

35" al., Nucl. Acids. Res. 22:5156-63 (1994).; Solovyev et al., 

4 
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Ismb 5:294-302 (1997); and GENESCAN, Burge et al., J. Hoi. 
Biol. 268:78-94 (1997), predict many putative genes without 
known homology or function. Such programs are known, 
however, to give high false positive rates. Burset et al . , 
5 Genomics 3.4:353-367 (1996). Using a consensus obtained by 
a plurality of such programs is known to increase the 
reliability of calling exons from genomic sequence. 
Ansari-Lari et al., Genome Res. 8(1): 29-40 (1998) 

Identification of functional genes from genomic 

10 data remains, however, an imperfect art. For example, in 
reporting the full sequence of human chromosome 21, the 
Chromosome 21 Mapping and Sequencing Consortium reports 
that prior bioinf ormatic estimates of human gene number may 
need to be revised substantially downwards. Nature 

15 405:311-199 (2000); Reeves, Nature 405:283-284 (2000). 

Thus, there is a need for methods and apparatus 
that permit the functions of the regions identified 
bioinf ormatically — and specifically, that permit the 
expression of regions predicted to encode protein - readily 

20 to be confirmed experimentally. 

Recently, the development of nucleic acid 
microarrays has made possible the automated and highly 
parallel measurement of gene expression. Reviewed in 
Schena (ed.), DMA Microarrays : A Practical Approach 

25 (Practical Approach Series ), Oxford University Press (1999) 
(ISBN: 0199637768); Nature Genet. 21 ( 1) (suppl) : 1 - 60 
(1999); Schena (ed. ) , Microarray Biochip: Tools and 
Technology , Eaton Publishing Company/BioTechniques Books 
Division (2000) (ISBN: 1881299376). 

30 It is common for microarrays to be derived from 

cDNA/EST libraries, either from those previously described 
in the literature, such as those from the I.M.A.G.E. 
consortium, Lennon et al., Genomics 33(1): 151-2 (1996), or 
from the construction of "problem specific" libraries 

35 targeted at a particular biological question, R.S. Thomas 
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et al. f Cancer Res. (in press). Such microarrays by 
definition can measure expression only of those genes found 
in EST libraries, and thus have not been useful as probes 
for genes discovered solely by genomic sequencing. 
5 The utility of using whole genome nucleic acid 

microarrays to answer certain biological questions has been 
demonstrated for the yeast Saccharomyces cerevisiae. De 
Risi et al., Science 278:680 (1997). The vast majority of 
yeast nuclear genes, approximately 95% however, are single 

10 exon genes, i.e., lack introns, Lopez et al-, RNA 5:1135- 
1137 (1999); Goffeau et al. f Science 274:563-67 (1996), 
permitting coding regions more readily to be identified. 
Whole genome nucleic acid microarrays have not generally 
been used to probe gene expression from more complex 

15 eukaryotic genomes, and in particular from those averaging 
more than one intron per gene. 

Because bone marrow is the tissue in which blood 
cells originate, diseases of the bone marrow are a 
significant cause of human morbidity and mortality. 

20 Increasingly, genetic factors are being found that 

contribute to predisposition, onset, and/or aggressiveness 
of most, if not all, of these diseases. Although mutations 
in single genes have in some cases been identified as 
causal - notably in the thalassemias and sickle cell anemia 

25 - disorders of the bone marrow are, for the most part, 

believed to have polygenic etiologies. There is a need for 
methods and apparatus that permit prediction, diagnosis and 
prognosis of diseases of the bone marrow, particularly 
those diseases with polygenic etiology. 



30 



Summary of the Invention 



The present invention solves these and other 
problems in the art by providing methods and apparatus for 
35 predicting, confirming, and displaying functional 

6 
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information derived from genomic sequence. The present 
invention also provides apparatus for verifying the 
expression of putative genes identified within genomic 
sequence. 

5 In particular, the invention provides novel 

genome-derived single exon nucleic acid microarrays useful 
for verifying the expression of putative genes identified 
within genomic sequence. 

The present invention also provides compositions 
10 and kits for the ready production of nucleic acids 

identical in sequence to, or substantially identical in 
sequence to, probes on the genome-derived single exon 
microarrays of the present invention. 

Accordingly, in a first aspect of the invention, 
15 there is provided a spatially-addressable set of single 

exon nucleic acid probes for measuring gene expression in a 
sample derived from human bone marrow, comprising a 
plurality of single exon nucleic acid probes according to 
any one of the nucleotide sequences set out in SEQ ID NOs : 
20 1 - 13,114 or a complementary sequence, or a portion of 
such a sequence. 

By plurality is meant at least two, suitably at 
least 20, most suitably at least 100, preferably at least 
1000 and, most preferably, upto 5000. 
25 In one embodiment of the first aspect, each of 

said plurality of probes is separately and addressably 
amplif iable . 

In an alternative embodiment, each of said 
plurality of probes is separately and addressably 
30 isolatable from said plurality. 

In a preferred embodiment, each of said plurality 
of probes is amplif iable using at least one common primer. 
Preferably, each of said plurality of probes is amplifiable 
using a first and a second common primer. 
35 In yet another embodiment, said set of single 

7 
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exon nucleic acid probes comprises between 50 - 20,000 
probes, for example, 50 - 5000. 

Suitably, said set of single exon nucleic acid 
probes comprises at least 50 - 1000 discrete single exon 
5 nucleic acid probes having a sequence as set out in any of 
SEQ ID NOS.: 1 - 26,012 or a complimentary sequence, or a 
portion of such a sequence. 

Preferably, the average length of the single exon 
nucleic acid probes is between 200 and 500 bp. It is ■ 

10 preferred that the average length should be at least 200bp, 
suitably at least 250bp, most suitably at least 300bp, 
preferably at least 400bp and, most preferably, 500 bp. 

In another embodiment, the single exon nucleic 
acid probes lack prokaryotic and bacteriophage vector 

15 sequence. It is preferred that at least 50%, suitably at 
least 60%, most suitably at least 70%, preferably at least 
75%, more preferably at least 80, 85, 90, 95 or 99% of said 
single exon nucleic acid probes lack prokaryotic and 
bacteriophage vector sequence. 

20 In another preferred embodiment, said single exon 

nucleic acid lack homopolymeric stretches of A or T. It is 
preferred that at least 50%, suitably at least 60%, most 
suitably at least 70%, preferably at least 75%, more 
preferably at least 80, 85, 90, 95 or 99% of said single 

25 exon nucleic acid probes lack homopolymeric stretches of A 
or T. 

Preferably, a spatially-addressable set of single 
exon nucleic acid probes in accordance with the first 
aspect of the invention is is addressably disposed upon a 
30 substrate. 

Suitable substrates include a filter membrane 
which may, preferably, be nitrocellulose or nylon. The 
nylon may preferably, be positively-charged. Other suitable 
substrates include glass, amorphous silicon, crystalline 
35 silicon, and plastic. Further suitable materials include 
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polymethylacrylic, polyethylene, polypropylene, 
polyacrylate, polymethylmethacrylate, polyvinylchloride, 
polytetraf luoroethylene, polystyrene, polycarbonate, 
polyacetal, polysulfone, celluloseacetate, 
5 cellulosenitrate, nitrocellulose, and mixtures thereof. 

In a second aspect of the invention, there is 
provided a microarray comprising a spatially addressable 
set of single exon nucleic acid probes in accordance with 
the first aspect of the invention. 

10 In one embodiment, a genome-derived single-exon 

microarray is packaged together with such an ordered set of 
amplifiable probes corresponding to the probes, or one or 
more subsets of probes, thereon. In alternative 
embodiments, the ordered set of amplifiable probes is 

15 packaged separately from the genome-derived single exon 
microarray. 

In another aspect, the invention provides genome- 
derived single exon nucleic acid probes useful for gene 
expression analysis, and particularly for gene expression 

20 analysis by microarray. In particular embodiments of this 
aspect, the present invention provides human single-exon 
probes that include specif ically-hybridizable fragments of 
SEQ ID Nos. 13,115 - 26,012, wherein the fragment 
hybridizes at high stringency to an expressed human gene. 

25 In particular embodiments, the invention provides single 
exon probes comprising SEQ ID Nos. 1 - 13,114. 

Accordingly, in a third aspect of the invention, 
there is provided a single exon nucleic acid probe for 
measuring human gene expression in a sample derived from 

30 human bone marrow which is a nucleic acid molecule 

comprising a nucleotide sequence as set out in any of SEQ 
ID NOs.: 1 - 13,114 or a complementary sequence or a 
fragment thereof wherein said probe hybridizes at high 
stringency to a nucleic acid expressed in the human bone 

35 marrow. 

9 
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In one embodiment, a single exon nucleic acid 
probe in accordance with the third aspect comprises a 
nucleotide sequence as set out in any of SEQ ID NOs.: 
13,115 - 26,012 or a complementary sequence or a fragment 
5 thereof. 

In a fourth aspect of the invention, there is 
provided a single exon nucleic acid probe for measuring 
human gene expression in a sample derived from human bone 
marrow which is a nucleic acid molecule having a sequence 

10 encoding a peptide comprising a peptide sequence as set out 
in any of SEQ ID NOs.: 26,013 - 38,628 or a complementary 
sequence or a fragment thereof wherein said probe 
hybridizes at -high stringency to a nucleic acid expressed 
in the human bone marrow. 

15 Preferably, a single exon nucleic acid probe in 

accordance with the third or fourth aspects of the 
invention comprises between at least 15 and 50 contiguous 
nucleotides of said SEQ ID NO: . It is preferred that the 
single exon nucleic acid probe comprises at least 15, 

20 suitably at least 20, more suitably at least 25 or 

preferably at least 50 contiguous nucleotides of said SEQ 
ID NO: . 

In another preferred embodiment, a single exon 
nucleic acid probe in accordance with the third or fourth 

25 aspects of the invention is between 3kb and 25kb in length. 
It is preferred that said probe is no more than 3kb, 
suitably no more than 5kb, more suitably no more than lOkb, 
preferably 15kb, more preferably 20kb or, most preferably, 
no more than 20kb in length. 

30 Preferably, a single exon nucleic acid probe in 

accordance with either the fifth or sixth aspect of the 
invention is DNA, preferably single-stranded DNA, RNA or 
PNA. 

In another embodiment of either the third or 
35 fourth aspect of the invention, a single exon nucleic acid 

10 
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probe is detectably labeled. Suitable detectable labels 
include a radionuclide, a fluorescent label or a first 
member of a specific binding pair. Suitable fluorescent 
labels include dyes such as cyanine dyes, preferably Cy3 
5 and Cy5 although other suitable dyes will be known to those 
skilled in the art. 

In a particularly preferred embodiment, a single 
exon nucleic acid probe in accordance with either the third 
or fourth aspect of the invention lacks prokaryotic and 

10 bacteriophage vector sequence. In yet another embodiment, a 
single exon nucleic acid probe in accordance with either 
the third or fourth aspect of the invention lacks 
homopolymeric stretches of A or T. 

In a fifth aspect of the invention, there is 

15 provided an amplifiable nucleic acid composition, 
comprising: 

the single exon nucleic acid probe in accordance 
with either of the third or fourth aspects of the 
invention; and at least one nucleic acid primer; 

20 wherein said at least one primer is sufficient to 

prime enzymatic amplification of said probe. 

In an sixth aspect of the invention, there is 
provided a method of measuring gene expression in a sample 
derived from human bone marrow, comprising: 

25 contacting the single exon microarray in 

accordance with the second aspect of the invention, with a 
first collection of detectably labeled nucleic acids, said 
first collection of nucleic acids derived from mRNA of 
human bone marrow; and then 

30 measuring the label detectably bound to each 

probe of said microarray. 

In a seventh aspect of the invention, there is 
provided a method of identifying exons in a eukaryotic 
genome, comprising: 

35 algorithmically predicting at least one exon from 

11 
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genomic sequence of said eukaryote; and then 

detecting specific hybridization of detectably 
labeled nucleic acids to a single exon probe, 

wherein said detectably labeled nucleic acids are 
5 derived from mRNA from the bone marrow of said eukaryote, 
said probe is a single exon probe having a fragment 
identical in sequence to, or complementary in sequence to, 
said predicted exon, said probe is included within a single 
exon microarray in accordance with the first aspect of the 
10 invention, and said fragment is selectively hybridizable at 
high stringency. 

In a eighth aspect of the invention, there is 
provided a method of assigning exons to a single gene, 
comprising: 

15 identifying a plurality of exons from genomic 

sequence in accordance with the seventh aspect of the 
invention/ and then 

measuring the expression of each of said exons in 
a plurality of tissues and/or cell types using 

20 hybridization to single exon microarrays having a probe 
with said exon, 

wherein a common pattern of expression of said 
exons in said plurality of tissues and/or cell types 
indicates that the exons should be assigned to a single 

25 gene. 

In an ninth aspect of the invention, there is 
provided a nucleic acid sequence as set out in any of SEQ 
ID NOs: 1 - 26,012 wherein said sequence encodes a peptide. 

In a tenth aspect of the invention, there is 
30 provided a peptide encoded by a sequence comprising a 

sequence as set out in any of SEQ ID NOs: 13,115 - 26,012, 
or a complementary sequence or coding portion thereof. 

In a preferred embodiment, a peptide may be 
encoded by a sequence comprising a sequence set out in any 
35 of SEQ ID NOS.: 1 - 13,114. 
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In a further aspect, the invention provides 
peptides comprising an amino acid sequence translated from 
the DMA fragments, said amino acid sequences comprising SEQ 
ID NOS. : 26, 013 - 38, 628. 
5 Accordingly in a eleventh aspect of the invention 

there is provided a peptide comprising a sequence as set 
out in any of SEQ ID NOs : 26,013 - 38,628, or fragment 
thereof. 

In another aspect, the invention provides means 
10 for displaying annotated sequence, and in particular, for 
displaying sequence annotated according to the methods and 
apparatus of the present invention. Further, such display 
can be used as a preferred graphical user interface for 
electronic search, query, and analysis of such annotated 
15 sequence. 



Detailed Description of the Invention 

20 Definitions 

As used herein, the term "microarray" and phrase 
"nucleic acid microarray" refer to a substrate-bound 
collection of plural nucleic acids, hybridization to each 
of the plurality of bound nucleic acids being separately 

25 detectable. The substrate can be solid or porous, planar 
or non-planar, unitary or distributed. 

As so defined, the term "microarray" and phrase 
"nucleic acid microarray" include all the devices so called 
in Schena {ed.), DNA Microarrays: A Practical A pproach 

30 (Practical Approach Series ), Oxford University Press (1999) 
(ISBN: 0199637768); Nature Genet, 21 (1) (suppl) : 1 - 60 
(1999); and Schena (ed.), Microarray Biochip: Tools and 
Technology , Eaton Publishing Company/BioTechniques Books 
Division (2000) (ISBN: 1881299376). As so defined, the 

35 term "microarray" and phrase "nucleic acid microarray" 

13 
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further include substrate-bound collections of plural 
nucleic acids in which the nucleic acids are distributably 
disposed on a plurality of beads, rather than on a unitary 
planar substrate, as is described, inter alia, in Brenner 
5 et al.. r Proc. Natl. Acad. Sci. USA 97 { 4 ): 1 66501670 (2000); 
in such case, the term "microarray" and phrase "nucleic 
acid microarray" refer to the plurality of beads in 
aggregate. 

As used herein with respect to a nucleic acid 

10 microarray, the term "probe" refers to the nucleic acid 
that is, or is intended to be, bound to the substrate; in 
such context, the term "target" thus refers to nucleic acid 
intended to be bound thereto by Watson-Crick 
complementarity. As used herein with respect to solution 

15 phase hybridization, the term "probe" refers to the nucleic 
acid of known sequence that is detectably labeled. 

As used herein, the expression "probe comprising 
SEQ ID NO.", and variants thereof, intends a nucleic acid 
probe, at least a portion of which probe has either (i) the 

20 sequence directly as given in the referenced SEQ ID NO. , or 
(ii) a sequence complementary to the sequence as given in 
the referenced SEQ ID NO. , the choice as between sequence 
directly as given and complement thereof dictated by the 
requirement that the probe hybridize to mRNA. 

25 As used herein, the term "open reading frame" and 

the equivalent acronym n ORF" refer to that portion of an 
exon that can be translated in its entirety into a sequence 
of contiguous .amino acids i.e. a nucleic acid sequence 
that, in at least one reading frame, does not possess stop 

30 codons; the term does not require that the ORF encode the 
entirety of a natural protein. 

As used herein, the term "amplicon" refers to a 
PCR product amplified from human genomic DNA, containing 
the predicted exon. 

35 As used herein the term "exon" refers to the 

14 
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consensus prediction of the various exon and gene 
predicting algorithms i.e. a nucleic acid sequence 
bioinformatically predicted to encode a portion of a 
natural protein. 
5 As used herein, the term "peptide" refers to a 

sequence of amino acids. The sequences referred to as 
PEPTIDE SEQ ID NOS. : are the predicted peptide sequences 
that would be translated from one of the exons, or a 
portion thereof set out in exon SEQ ID NOS.:. The codons 

10 encoding the peptide are wholly contained within the exon. 

As used herein, a "portions" of a defined 
nucleotide sequence or sequences can be and, preferably, 
are fragments unique to that sequence or to one or a 
combination of those sequences. A fragment unique to a 

15 nucleic acid molecule is one that is a signature for the 
larger nucleic acid molecule. 

As used herein, the phrase "expression of a 
probe" and its linguistic variants means that the ORF 
present within the probe, or its complement, is present 

20 within a target mRNA. 

As used herein, "stringent conditions" refers to 
parameters well known to those skilled in the art. When a 
nucleic acid molecule is said to be hybridisable to another 
of a given sequence under "stringent conditions" it is 

25 meant that it is homologous to the given sequence. 

As used herein, the phrase "specific binding 
pair" intends a pair of molecules that bind to one another 
with high specificity. Binding pairs are said to exhibit 
specific binding when they exhibit avidity of at least 10 7 , 

30 preferably at- least 10 8 , more preferably at least 10 9 
liters/mole. Nonlimiting examples of specific binding 
pairs are: antibody and antigen; biotin and avidin; and 
biotin and streptavidin. 

As used herein with respect to the visual display 

35 of annotated genomic sequence, the term "rectangle" means 
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any geometric shape that has at least a first and a second 
border, wherein the first and second borders each are 
capable of mapping uniquely to a point of another visual 
object of the display. 
5 As used herein, a "Mondrian" means a visual 

display in which a single genomic sequence is annotated 
with predicted and experimentally confirmed functional 
information. 

10 

Brief Description of the Drawings 

The present invention is further illustrated with 
reference to the following non-limiting figures and 
15 examples in which: 

FIG. 1- illustrates a process for predicting 
functional regions from genomic sequence, confirming the 
functional activity of such regions experimentally, and 
associating and displaying the data so obtained in 
20 meaningful and useful relationship to the original sequence 
data; 

FIG. 2 further elaborates that portion of the 
process schematized in FIG. 1 for predicting functional 
regions from genomic sequence; 
25 FIG . 3 illustrates a Mondrian visual display; 

FIG. 4 presents a Mondrian showing a hypothetical 
annotated genomic sequence; 

FIG. 5 is a histogram showing the distribution of 
ORF length and PCR products as obtained, with ORF length 
30 shown in black and PCR product length shown in dotted 
lines; 

FIG. 6 is a histogram showing the distribution, 
among exons predicted according to the methods described, 
of expression as measured using simultaneous two color 
35 hybridization to a genome-derived single exon microarray. 
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The graph shows the number of sequence-verified products 
that were either not expressed ("0")/ expressed in one or 
more but not all tested tissues ("1" - "9"), or expressed 
in all tissues tested ("10"); 
5 FIG. 7 is a pictorial representation of the 

expression of verified sequences that showed expression 
with signal intensity greater than 3 in at least one 
tissue, with: FIG. 7A showing the expression as measured by 
microarray hybridization in each of the 10 measured 

10 tissues, and the expression as measured "bioinf ormatically" 
by query of EST, NR and SwissProt databases; with FIG. 7B 
showing the legend for display of physical expression 
(ratio) in FIG. 7A; and with FIG. 7C showing the legend for 
scoring EST hits as depicted in FIG. 7A; 

15 FIG. 8 shows a comparison of normalized CY3 

signal intensity for arrayed sequences that were identical 
to sequences in existing EST, NR and SwissProt databases or 
that were dissimilar (unknown) , where black denotes the 
signal intensity for all sequence-verified products with a 

20 BLAST Expect ( "E" ) value of greater than le-30 (1 x 10~ 30 ) 
("unknown") and a dotted line denotes sequence-verified 
spots with a BLAST expect ("E") value of less than le-30 (1 
x 10 -30 ) ("known") ; 

FIG. 9 presents a Mondrian of BAC AC008172 (bases 

25 25,000 to 130,000), containing the carbamyl phosphate 
synthetase gene (AF154830 . 1) ; and 

FIG . 10 is a Mondrian of BAC A049839. 

30 Methods and Apparatus for Predicting, Confirming, 

Annotating, and Displaying Functional Regions From Genomic 
Sequence Data 



35 



FIG . 1 is a flow chart illustrating in broad 
outline a process for predicting functional regions from 
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genomic sequence, confirming and characterizing the 
functional activity of such regions experimentally, and 
then associating and -displaying the information so obtained 
in meaningful and useful relationship to the original 
5 sequence data. 

The initial input into process 10 of the present 
invention is drawn from one or more databases 100 
containing genomic sequence data. Because genomic sequence 
is usually obtained from subgenomic fragments, the sequence 

10 data typically will be stored in a series of records 
corresponding to these subgenomic sequenced fragments. 
Some fragments will have been catenated to form larger 
contiguous sequences ("contigs"); others will not. A 
finite percentage of sequence data in the database will 

15 typically be erroneous, consisting inter alia of vector 
sequence, sequence created from aberrant cloning events, 
sequence of artificial polylinkers, and sequence that was 
erroneously read. 

Each sequence record in database 100 will 

20 minimally contain as annotation a unique sequence 

identifier (accession number) , and will typically be 
annotated further to identify the date of accession, 
species of origin, and depositor. Because database 100 can 
contain nongenomic sequence, each sequence will typically 

25 be annotated further to permit query for genomic sequence. 
Chromosomal origin, optionally with map location, can also 
be present. Data can be, and over time increasingly will 
be, further annotated with additional information, in part 
through use of the present invention, as described below. 

30 Annotation can be present within the data records, in 
information external to database 100 and linked to the 
records thereto, or through a combination of the two. 

Databases useful as genomic sequence database 100 
in the present invention include GenBank, and particularly 

35 include several divisions thereof, including the 
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htgs (draft), NT (nucleotide, command line), and NR 
(nonredundant ) divisions. GenBank is produced by the 
National Institutes of Health and is maintained by the 
National Center for Biotechnology Information (NCBI) . 
5 Databases of genomic sequence from species other than 
human, such as mouse, rat, Arabidopsis, C. elegans, C. 
brigsii, Drosophila, zebra fish, and other higher 
eukaryotic organisms will also prove useful as genomic 
sequence database 100. 

10 Genomic sequence obtained by query of genomic 

sequence database 100 is then input into one or more 
processes 200 for identification of regions therein that 
are predicted to have a biological function as specified by 
the user. Such functions include, but are not limited to, 

15 encoding protein, regulating transcription, regulating 

message transport after transcription into mRNA, regulating 
message splicing after transcription into mRNA, of 
regulating message degradation after transcription into 
mRNA, and the like. Other functions include directing 

20 somatic recombination events, contributing to chromosomal 
stability or movement, contributing to allelic exclusion or 
X chromosome inactivation, and the like. 

The particular genomic sequence to be input into 
process 200 will depend upon the function for which 

25 relevant sequence is to be identified as well as upon the 
approach chosen for such identification. Process step 200 
can be iterated to identify different functions within a 
given genomic region. In such case, the input often will 
be different for the several iterations. 

30 Sequences predicted to have the requisite 

function by process 200 are then input into process 300, 
where a subset of the input sequences suitable for 
experimental confirmation is identified. Experimental 
confirmation can involve physical and/or bioinf ormatic 

35 assay. Where the subsequent experimental assay is 
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bioinf ormatic, rather than physical, there are fewer 
constraints on the sequences that can be tested, and in 
this latter case therefore process 300 can output the 
entirety of the input sequence. 
5 The subset of sequences output from process 300 

is then used in process 400 for experimental verification 
and characterization of the function predicted in 
process 200, which experimental verification can, and often 
will, include both physical and bioinf ormatic assay. 

10 Process 500 annotates the sequence data with the 

functional information obtained in the physical and/or 
bioinformatic assays of process 400. Such annotation can 
be done using any technique that usefully relates the 
functional information to the sequence, as, for example, by 

15 incorporating the functional data into the sequence data 
record itself, by linking records in a hierarchical or 
relational database, by linking to external databases, by a 
combination thereof, or by other means well known within 
the database arts. The data can even be submitted for 

20 incorporation into databases maintained by others, such as 
GenBank, which is maintained by NCBI. 

As further noted in FIG. 1, additional annotation 
can be input into process 500 from external sources 600. 

The annotated data is then displayed in process 

25 800, either before, concomitantly with, or after optional 
storage 700 on nontransient media, such as magnetic disk, 
optical disc, magnetooptical disk, flash memory, or the 
like. 

FIG. 1 shows that the experimental data output 
30 from process 400 can be used in each preceding step of 

process 10: e.g., facilitating identification of functional 
sequences in process 200, facilitating identification of an 
experimentally suitable subset thereof in process 300, and 
facilitating creation of physical and/or informational 
35 substrates for, and performance of subsequent assay, of 
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functional sequences in process 400. 

Information from each step can be passed directly 
to the succeeding process, or stored in permanent or 
interim form prior to passage to the succeeding process. 
5 Often, data will be stored after each, or at least a 

plurality, of such process steps. Any or all process steps 
can be automated. 

FIG. 2 further elaborates the prediction of 
functional sequence within genomic sequence according to 
10 process 200. 

Genomic sequence database 100 is first queried 20 
for genomic sequence. 

The sequence required to be returned by query 20 
will depend, in the first instance, upon the function to be 
15 identified. 

For example, genomic sequences that function to 
encode protein can be identified inter alia using gene 
prediction approaches, comparative sequence analysis 
approaches, or combinations of the two. In gene prediction 
20 analysis, sequence from one genome is input into process 
200 where at least one, preferably a plurality, of 
algorithmic methods are applied to identify putative coding 
regions. In comparative sequence analysis, by contrast, 
corresponding, e.g., syntenic, sequence from a plurality of 
25 sources, typically a plurality of species, is input into 
process 200, where at least one, possibly a plurality, of 
algorithmic methods are applied to compare the sequences 
and identify regions of least variability. 

The exact content of query 20 will also depend 
30 upon the database queried. For example, if the database 
contains both genomic and nongenomic sequence, perhaps 
derived from multiple species, and the function to be 
determined is protein coding regions in human genomic 
sequence, the query will accordingly require that the 
35 sequence returned be genomic and derived from humans. 
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Query 20 can also incorporate criteria that 
compel return of sequence that meets operative requirements 
of the subsequent analytical method. Alternatively, or in 
addition, such operative criteria can be enforced in 
5 subsequent preprocess step 24. 

For example, if the function sought to be 
identified is protein coding, query 20 can incorporate 
criteria that return from genomic sequence database 100 
only those sequences present within contigs sufficiently 
10 long as to have obviated substantial fragmentation of any 
given exon among a plurality of separate sequence 
fragments . 

Such criteria can, for example, consist of a 
required minimal individual genomic sequence fragment 

15 length, such as 10 kb, more typically 20 kb, 30 kb, 40kb, 
and preferably 50 kb or more, as well as an optional 
further or alternative requirement that sequence from any 
given clone, such as a bacterial artificial chromosome 
("BAC"), be presented in no more than a finite maximal 

20 number of fragments, such as no more than 20 separate 

pieces, more typically no more than 15 fragments, even more 
typically no more than about 10 - 12 fragments. 

Results using the present invention have shown 
that genomic sequence from bacterial artificial chromosomes 

25 (BACs) is sufficient for gene prediction analysis according 
to the present invention if the sequence is at least 50 kb ~ 
in length, and if additionally the sequence from any given 
BAC is presented in fewer than 15, and preferably fewer 
than 10, fragments. Accordingly, query 20 can incorporate 

30 a requirement that data accessioned from BAC sequencing be 
in fewer than 15, preferably fewer than 10, fragments. 

An additional criterion that can be incorporated 
into the query can be the date, or range of dates, of 
sequence accession. Although the process has been 

35 described above as if genomic sequence database 100 were 
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static, it is of course understood that the genomic 
sequence databases need not be static, and indeed are 
typically updated on a frequent, even hourly, basis. Thus, 
as further described in Examples 1 and 2, infra, it is 
5 possible to query the database for newly added sequence, 
either newly added after an absolute date, or newly added 
relative to a prior analysis performed using the methods 
and apparatus of the present invention. In this way, the 
process herein described can incorporate a dynamic, 

10 temporal component. 

One utility of such temporal limitation is to 
identify, from newly accessioned genomic sequence, the 
presence of novel genes, particularly those not previously 
identified by EST sequencing (or other sequencing efforts 

15 that are similarly based upon gene expression) . As further 
described in Example 1, such an approach has shown that 
newly accessioned human genomic sequence, when analyzed for 
sequences that function to encode protein, readily 
identifies genes that are novel over those in existing EST 

20 and other expression databases. This makes the methods of 
the present invention extremely powerful gene discovery 
tools. And as would be appreciated, such gene discovery 
can be performed using genomic sequence from species other 
than human. 

25 If query 20 incorporates multiple criteria, such 

as above-described, the multiple criteria can be performed 
as a series of separate queries or as a single query, 
depending in part upon the query language, the complexity 
of the query, and other considerations well known in the 

30 database arts . 

If query 20 returns no genomic sequence meeting 
the query criteria, the negative result can be reported by 
process 22, and process 200 (and indeed, entire process 10) 
ended 23, as shown. Alternatively, or in addition to 

35 report and termination of the initial inquiry, a new query 
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20 can be generated that takes into account the initial 
negative result. 

When query 20 returns sequence meeting the query 
criteria, the returned sequence is then passed to optional 
5 preprocessing 24, suitable and specific for the desired" 
analytical approach and the particular analytical methods 
thereof to be used in process 25. 

Preprocessing 24 can include processes suitable 
for many approaches and methods thereof, as well as 
10 processes specifically suited for the intended subsequent 
analysis . 

Preprocessing 24 suitable for most approaches and 
methods will include elimination of sequence irrelevant to, 
or that would interfere with, the subsequent analysis. 

15 Such sequence includes repetitive sequence, such as Alu 
repeats and LINE elements, vector sequence, artificial 
sequence, such as artificial polylinkers, and the like. 
Such removal can readily be performed by identification and 
subsequent masking of the undesired sequence. 

20 Identification can be effected by comparing the 

genomic sequence returned by query 20 with public or 
private databases containing known repetitive sequence, 
vector sequence, artificial sequence, and other artif actual 
sequence. Such comparison can readily be done using 

25 programs well known in the art, such as CROSS_MATCH, or by 
proprietary sequence comparison programs the engineering of 
which is well within the skill in the art. 

Alternatively, or in addition, undesirable, 
including artifactual, sequence can be identified 

30 algorithmically without comparison to external databases 
and thereafter removed. For example, synthetic polylinker 
sequence can be identified by an algorithm that identifies 
a significantly higher than average density of known 
restriction sites. As another example, vector sequence can 

35 be identified by algorithms that identify nucleotide or 
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codon usage at variance with that of the bulk of the 
genomic sequence. 

Once identified, undesired sequence can be 
removed. Removal can usefully be done by masking the 
5 undesired sequence as, for example, by converting the 

specific nucleotide references to one that is unrecognized 
by the subsequent bioinf ormatic algorithms, such as "X". 
Alternatively, but at present less preferred, the undesired 
sequence can be excised from the returned genomic sequence, 

10 leaving gaps. 

Preprocessing 24 can further include selection 
from among duplicative sequences of that one sequence of 
highest quality. Higher quality can be measured as a lower 
percentage of, fewest number of, or least densely clustered 

15 occurrence of ambiguous nucleotides, defined as those 
nucleotides that are identified in the genomic sequence 
using. symbols indicating ambiguity. Higher quality can 
also or alternatively be valued by presence in the longest 
contig . 

20 Preprocessing 24 can, and often will, also 

include formatting of the data as specifically appropriate 
for passage to the analytical algorithms of process 25. 
Such formatting can and typically will include, inter alia, 
addition of a unique sequence identifier, either derived 

25 from the original accession number in genomic sequence 
database 100, or newly applied, and can further include 
additional annotation. Formatting can include conversion 
from one to another sequence listing standard, such as 
conversion to or from FASTA or the like, depending upon the 

30 input expected by the subsequent process. 

Preprocessing, which can be optional depending 
upon the function desired to be identified and the 
informational requirements of the methods for effecting 
such identification, is followed by sequence processing 25, 

35 where sequences with the desired function are identified 
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within the genomic sequence. 

As mentioned above, such functions can include, 
but are not limited to, encoding protein, regulating 
transcription, regulating message transport after 
5 transcription into mRNA, regulating message splicing after 
transcription, of regulating message degradation, and the 
like. Other functions include directing somatic 
recombination events, contributing to chromosomal stability 
or movement, contributing to allelic exclusion or X 

10 chromosome inactivation, or the like. 

The methods of the present invention are 
particularly useful for gene discovery, that is, for 
identifying, from genomic sequence, regions that function 
to encode genes, and in a particularly useful embodiment, 

15 for identifying regions that function to encode genes not 
hitherto identified by expression-based or directed cloning 
and sequencing. In conjunction with verification using the 
novel single exon microarrays of the present invention, as 
further described below, the methods herein described 

20 become powerful gene discovery tools. 

Accordingly, in a preferred embodiment of the 
present invention, process 25 is used to identify putative 
coding regions. Two preferred approaches in process 25 for 
identifying sequence that encodes putative genes are gene 

25 prediction and comparative sequence analysis. 

Gene prediction can be performed using any of a 
number of algorithmic methods, embodied in one or more 
software programs, that identify open reading frames (ORFs) 
using a variety of heuristics, such as GRAIL, DICTION, and 

30 GEiSIEFINDER. Comparative sequence analysis similarly can be 
performed using any of a variety of known programs that 
identify regions with lower sequence variability. 

As further described in Example 1, below, gene 
finding software programs yield a range of results. For 

35 the newly accessioned human genomic sequence input in 
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Example 1, for example, GRAIL identified the greatest 
percentage of genomic sequence as putative coding region, 
2% of the data analyzed; GENEFINDER was second, calling 1%; 
and DICTION yielded the least putative coding region, with 
5 0.8% of genomic sequence called as coding region. 

Increased reliability can be obtained when 
consensus is required among several such methods. Although 
discussed herein particularly with respect to exon calling, 
consensus among methods will in general increase 

10 reliability of predicting other functions as well. 

Thus, as indicated by query 26, sequence 
processing 25, optionally with preprocessing 24, can be 
repeated with a different method, with consensus among such 
iterations determined and reported in process 27. 

15 Process 27 compares the several outputs for a 

given input genomic sequence and identifies consensus among 
the separately reported results. The consensus itself, as 
well as the sequence meeting that consensus, is then stored 
in process 29a, displayed in process 29b, and/or output to 

20 process 300 for subsequent identification of a subset 
thereof suitable for assay. 

Multiple levels of consensus can be calculated 
and reported by process 27. For example, as further 
described in Example 1, infra, process 27 can report 

25 consensus as between all specific pairs of methods of gene 
prediction, as consensus among any one or more of the pairs 
of methods of gene prediction, or as among all of the gene 
prediction algorithms used. Thus, in Example 1, process 27 
reported that GRAIL and GENEFINDER programs agreed on 0.7% 

30 - of genomic sequence, that GRAIL and DICTION agreed on 0.5% 
of genomic sequence, and that the three programs together 
agreed on 0.25% of the data analyzed. Put another way, 
0.25% of the genomic sequence was identified by all three 
of the programs as containing putative coding region. 

35 Furthermore, consensus can be required among 
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different approaches to identifying a chosen function. 

For example, if the function desired to be 
identified is coding of protein sequence, and a first used 
approach to exon calling is gene prediction, the process 
5 can be repeated on the same input sequence, or subset 
thereof, with another approach, such as comparative 
sequence analysis. In such a case, where comparative 
sequence analysis follows gene prediction, the comparison 
can be performed not only on genomic nucleic acid sequence, 

10 but additionally or alternatively can be performed on the 
predicted amino acid sequence translated from the ORFs 
prior identified by the gene prediction approach. 

Although shown as an iterative process, the 
multiple analyses required to achieve consensus can be done 

15 in series, in parallel, or some combination thereof- 
Predicted functional sequence, optionally 
representing a consensus among a plurality of methods and 
approaches for determination thereof, is passed to process 
300. for identification of a subset thereof for functional 

20 assay. 

In the preferred embodiment of the methods of the 
present invention, wherein the function sought to be 
identified is protein coding, process 300 is used to 
identify a subset thereof suitable for experimental 

25 verification by physical and/or bioinf ormatic approaches. 

For example, putative ORFs identified in process 
200 can be classified, or binned, bioinf ormatically into 
putative genes. This binning can be based inter alia upon 
consideration of the average number of exons/gene in the 

30 species chosen for analysis, upon density of exons that 
have been called on the genomic sequence, and other 
empirical rules. Thereafter, one or more among the gene- 
specific ORFs can be chosen for subsequent use in gene 
expression assay. 

35 Where such subsequent gene expression assay uses 
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amplified nucleic acid, considerations such as desired 
amplicon length, primer synthesis requirements, putative 
exon length, sequence GC content, existence of possible 
secondary structure, and the. like can be used to identify 
5 and select those ORFs that appear most likely successfully 
to amplify. Where -subsequent gene expression assay relies 
upon nucleic acid hybridization, whether or not using 
amplified product, further considerations involving 
hybridization stringency can be applied to identify that 

10 subset of sequences that will most readily permit sequence- 
specific discrimination at a chosen hybridization and wash 
stringency. One particular such consideration is avoidance 
of putative exons that span repetitive sequence; such 
sequence can hybridize spuriously to nonspecific message, 

15 reducing specific signal in the hybridization. 

For bioinf ormatic assay, there are fewer 
constraints on the sequences that can be tested 
experimentally, and in this latter case therefore process 
300 can output the entirety of the input sequence. 

20 The subset of sequences identified by process 300 

as suitable for use in assay is then used in process 400 to 
create the physical and/or informational substrate for 
experimental verification of the predictions made in 
process 200, and thereafter to assay those substrates. 

25 As mentioned, the methods of the present 

invention are particularly useful for identifying potential 
coding regions within genomic sequence. In a preferred 
embodiment of process 400, therefore, the expression of the 
sequences predicted to encode protein is verified. The 

30 combination of the predictive and experimental methods 
provides a powerful gene discovery engine. 

Thus, in another aspect, the present invention 
provides methods and apparatus for verifying the expression 
of putative genes identified within genomic sequence. In 

35 particular, the invention provides a novel method of 
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verifying gene expression in which expression of predicted 
ORFs is measured and confirmed using a novel type of 
nucleic acid microarray, the genome-derived single exon 
nucleic acid microarrays of the present invention. 
5 Putative ORFs as predicted by a consensus of gene 

calling/ particularly gene prediction, algorithms in 
process 200, and as further identified as suitable by 
■ process 300, are amplified from genomic DNA using the 
polymerase chain reaction (PCR) . Although PCR is 
10 conveniently used, other amplification approaches can also 
be used. 

- Amplification schemes can be designed to capture 

the entirety of each predicted ORF in an amplicon with 
minimal additional (that is, intronic or intergenic) 

15 sequence. Because ORFs predicted from human genomic 

sequence using the methods of the present invention differ 
in length, such an approach results in amplicons of varying 
length. 

However, most predicted ORFs are shorter than 500 
20 bp in length, and although amplicons of at least about 100 
or 200 base pairs can be immobilized as probes on nucleic 
acid microarrays, early experimental results using the 
methods of the present invention have suggested that longer 
amplicons, at least about 400 or 500 base pairs, are more 
25 effective. Furthermore, certain advantages derive from 

application to the microarray of amplicons of defined size. 

Therefore, amplification schemes can 
alternatively, and preferably, be designed to amplify 
regions of defined size, preferably at least about 300, 400 
30 or 500 bp, centered about each predicted ORF. Such an 
approach results in a population of amplicons of limited 
size diversity, but that typically contain intronic and/or 
intergenic nucleic acid in addition to putative ORF. 

Conversely, somewhat fewer than 10% of ORFs 
35 predicted from human genomic sequence according to the 
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methods of the present invention exceed 500 bp in length. 
Portions of such extended ORFs, preferably at least about 
300,400 or 500 bp in length, can be amplified. However, it 
has been discovered that the percentage success at 
5 amplifying pieces of such ORFs is low, and that such 

putative exons are more effectively amplified when larger 
fragments, at least about 1000 or 1500 bp, and even as 
large as 2000 bp are amplified. 

The putative ORFs selected in process 300 are 
10 thus input into one or more primer design programs, such as 
PRIMER3 (available online for use at 

http://www-genome.wi.mit.edu/cgi-bin/primer/ ), with a goal 
of amplifying at least about 500 base pairs of genomic 
sequence centered within or about ORFs predicted to be no 

15 more than about 500 bp, or at least about 1000 - 1500 bp of 
genomic sequence for ORFs predicted to exceed 500 bp in 
length, and the primers synthesized by standard techniques. 
Primers with the requisite sequences can be purchased 
commercially or synthesized by standard techniques. 

20 Conveniently, a first predetermined sequence can 

be added commonly to the ORF-specif ic 5 ' primer and a 
second, typically different, predetermined sequence 
commonly added to each 3 f ORF-unique primer. This serves 
to immortalize the amplicon, that is, serves to permit 

25 further amplification of any amplicon using a single set of 
primers complementary respectively to the common 5 ! and 
common 3' sequence elements. The presence of these 
"universal" priming sequences further facilitates later 
sequence verification, providing a sequence common to all 

30 amplicons at which to prime sequencing reactions." The 
common 5 1 and 3' sequences further serve to add a cloning 
site should any of the ORFs warrant further study. 

Such predetermined sequence is usefully at least 
about 10, 12 or 15 nt in length, and usually does not 

35 exceed about 25 nt in length. The "universal" priming 
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sequences . used in the examples presented infra were each 16 
nt long. 

The genomic DNA to be used as substrate for 
amplification will come from the eukaryotic species from 
5 which the genomic sequence data had originally been 
obtained, or a closely related species, and can 
conveniently be prepared by well known techniques from 
somatic or germline tissue or cultured cells of the 
organism. See, e.g., Short Protocols in Molecular Biology 

10 : A Compendium of Methods from Current Protocols in 
Molecular Biology , Ausubel et al. (eds.), 4 th edition 
(April 1999), John Wiley & Sons (ISBN: 047132938X) and 
Maniatis et al . , Molecular Cloning : A Laboratory Manual , ( 
2 nd edition (December 1989), Cold Spring Harbor Laboratory 

15 Press (ISBN: 0879693096). Many such prepared genomic DNAs 
are available commercially, with the human genomic DNAs 
additionally having certification of donor informed 
consent. 

Although the intronic and intergenic material 

20 flanking putative coding regions in the amplicons could 

potentially interfere with hybridizations during microarray 
experiments, we have found, surprisingly, that differential 
expression ratios are not significantly affected. Rather, 
the predominant effect of exon size is to alter the 

25 absolute signal intensity, rather than its ratio. Equally 
surprising, the art had suggested that single exon probes 
would not provide sufficient signal intensity for high 
stringency hybridization analyses; we find that such probes 
not only provide adequate signal, but have substantial 

30 advantages, as herein described. 

After partial purification, as by size exclusion 
spin column, with or without confirmation as to amplicon 
quality as by gel electrophoresis, each amplicon (single 
exon probe) is disposed in an array upon a support 

35 substrate. 
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Methods for creating microarrays by deposition 
and fixation of nucleic acids onto support substrates are 
well known in the art (Reviewed by Schena et al . , see 
above) . 

5 Typically, the support substrate will be glass, 

although other materials, such as amorphous or crystalline 
silicon or plastics. Such plastics include 
polymethylacrylic, polyethylene, polypropylene, 
polyacrylate, polymethylmethacrylate, polyvinylchloride, 

10 polytetraf luoroethylene, polystyrene, polycarbonate, 
polyacetal, polysulfone, celluloseacetate, 
cellulosenitrate, nitrocellulose, or mixtures thereof, can 
also be used. Typically, the support will be rectangular, 
although other shapes, particularly circular disks and even 

15 spheres, present certain advantages. Particularly 
advantageous alternatives to glass slides as support 
substrates for array of nucleic acids are optical discs, as 
described in WO 98/12559. 

The amplified nucleic acids can be attached 

20 covalently to a surface of the support substrate or, more 
typically, applied to a derivatized surface in a chaotropic 
agent that facilitates denaturation and adherence by 
presumed noncovalent interactions, or some combination 
thereof. 

25 Robotic spotting devices useful for arraying 

nucleic acids on support substrates can be constructed 
using public domain specifications (The MGuide, version 
2.0, http://cmgm.stanford.edu/pbrown/mguide/index.html), or 
can conveniently be purchased from commercial sources 

30 (MicroArray Genii Spotter and MicroArray Genlll Spotter, 
Molecular Dynamics, Inc., Sunnyvale, CA) . Spotting can 
also be effected by printing methods, including those using 
ink jet technology. 

As is well known in the art, microarrays 

35 typically also contain immobilized control nucleic acids. 
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For controls useful in providing measurements of background 
signal for the genome-derived single exon microarrays of 
the present invention, a plurality of E. coli genes can 
readily be used. As further described in Example 1, 16 or 
5 32 E. coli genes suffice to provide a robust measure of 
background noise in such microarrays. 

As is well known in the art, the amplified 
product disposed in arrays on a support substrate to create 
a nucleic acid microarray can consist entirely of natural 

10 nucleotides linked by phosphodiester bonds, or 

alternatively can include either nonnative nucleotides, 
alternative internucleotide linkages, or both, so long as 
complementary binding can be obtained in the hybridization. • 
If enzymatic amplification is used to produce the 

15 immobilized probes, the amplifying enzyme will impose 

certain further constraints upon the types of nucleic acid 
analogs that can be generated. 

Although particularly described herein as using 
high density microarrays constructed on planar substrates, 

20 the methods of the present invention for confirming the 

expression of ORFs predicted from genomic sequence can use 
any of the known types of microarrays, as herein defined, 
including lower density planar arrays, and microarrays on 
nonplanar, nonunitary, distributed substrates. 

25 For example, gene expression can be confirmed 

using hybridization to lower density arrays, such as those 
constructed on membranes, such as nitrocellulose, nylon, 
and positively-charged derivatized nylon membranes. 
Further, gene expression can also be confirmed using 

30 nonplanar, bead-based microarrays such as are described in 
Brenner et al. f Proc. Natl. Acad. Sci. USA 97 (4) : 166501670 
(2000); O.S. Patent No. 6,057,107; and U.S. Patent No. 
5,736,330.. In theory, a packed collection of such beads 
provides in aggregate a higher density of nucleic acid 

35 probe than can be achieved with spotting or lithography 
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techniques on. a single planar substrate. 

Planar microarrays on solid substrates, however, 
provide certain useful advantages, including high 
throughput and compatibility with existing readers. For 
5 example, each standard microscope slide can include at 
least 1000, typically at least 2000, preferably 5000 and 
upto 10,000 - 50,000 or more nucleic acid probes of 
discrete sequence. The number of sequences deposited will 
depend on their required application. 

10 Each putative gene can be represented in the 

array by a single predicted ORF. Alternatively, genes can 
be represented by more than one predicted ORF. For 
purposes of measuring differential splicing, more than one 
predicted ORF will be provided for a putative gene. And as 

15 is well known in the art, each probe of defined sequence, 
representing a single predicted ORF, can be deposited in a 
plurality of locations on" a_ single microarray to provide 
redundancy of signal. 

The genome-derived single exon microarrays 

20 described above differ in several fundamental and 

advantageous ways from microarrays presently used in the 
gene expression art, including (1) those created by 
deposition of mRNA-derived nucleic acids, (2) those created 
by in situ synthesis of oligonucleotide probes, and (3) 

25 those constructed from yeast genomic DNA. 

Most nucleic acid microarrays that are in use for 
study of eukaryotic gene expression have as immobilized 
probes nucleic acids that are derived - either directly or 
indirectly — from expressed message-. As discussed above, 

30 it is common, for example, for such microarrays to be 
derived from cDNA/EST- libraries, either from those 
previously described in the literature, see Lennon et ai., 
or from the de novo construction of "problem specific" 
libraries targeted at a particular biological question, 

35 R.S. Thomas et al. f Cancer Res. (in press). Such 
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microarrays are herein collectively denominated "EST 
microarrays". 

Such EST microarrays by definition can measure 
expression only of those genes found in EST libraries, 
5 shown herein to represent only a fraction of expressed 

genes. Furthermore, such libraries - and thus microarrays 
based thereupon - are biased by the tissue or cell type of 
message origin, by the expression levels of the respective 
genes within the tissues, and by the ability of the message 

10 successfully to have been reverse-transcribed and cloned. 

Thus, as further discussed in Example 1, the 
methods of the present invention enable sequences that do 
not appear in EST or other expression databases to be 
determined - subsequently arrayed for expression 

15 measurements could not, therefore, have been represented as 
probes on an EST microarray. And as further demonstrated 
in the examples, infra, the remaining population of genes 
identified from genomic sequence by the methods of .the 
present invention - that is, the one third of sequences 

20 that had previously been accessioned in EST or other 

expression databases — are biased toward genes with higher 
expression levels. 

Representation of a message in an EST and/or cDNA 
library depends upon the successful reverse transcription, 

25 optionally but typically with subsequent successful 

cloning, of the message. This introduces substantial bias 
into the population of probes available for arraying in EST 
microarrays . 

In contrast, neither reverse transcription nor 

30 cloning is required to produce the probes arrayed on the 
genome-derived single exon microarrays of the present 
invention. And although the ultimate deposition of a probe 
on the genome-derived single exon microarray of the present 
invention depends upon a successful amplification from 

35 genomic material, a priori knowledge of the sequence of the 
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desired amplicon affords greater opportunity to recover any 
given probe sequence recalcitrant to amplification than is 
afforded by the requirement for successful reverse 
transcription and cloning of unknown message in EST 
5 approaches. 

Thus, the genome-derived single exon microarrays 
of the present invention present a far greater diversity of 
probes for measuring gene expression, with far less bias, 
than do EST microarrays presently used in the art. 

10 As a further consequence of their ultimate origin 

from expressed message, the probes in EST microarrays often 
contain poly-A (or complementary poly-T) stretches derived 
from the poly-A tail of mature mRNA. These homopblymeric 
stretches contribute to cross-hybridization, that is, to a 

15 spurious signal occasioned by hybridization to the 

homopolymeric tail of a labeled cDNA that lacks sequence 
homology to the gene-specific portion of the probe. 

In contrast, the probes arrayed in the genome- 
derived single exon microarrays of the present invention 

20 lack homopolymeric stretches derived from message 

polyadenylation, and thus can provide more specific signal. 
Typically, at least about 50, 60 or 75% of the probes on 
the genome-derived single exon microarrays of the present 
invention lack homopolymeric regions consisting of A or T, 

25 where a homopolymeric region is defined for purposes herein 
as stretches of 25. or more, typically 30 or more, identical 
nucleotides . 

A further distinction, which also affects the 
specificity of hybridization, is occasioned by the typical 

30 derivation of EST microarray probes from cloned material. 
Because much of the probe material disposed as probes on 
EST microarrays is excised or amplified from plasmid, 
phage, or phagemid vectors, EST microarrays typically 
include a fair amount of vector sequence, more so when the 

35 probes are amplified, rather than excised, from the vector. 
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In contrast, the vast majority of probes in the 
genome-derived single exon microarrays of the present 
invention contain no prokaryotic or bacteriophage vector 
sequence, having been amplified directly or indirectly from 
5 genomic DNA. Typically, therefore, at least about 50, 60, 
70 or 80% or more of individual exon-including probes 
disposed on a genome-derived single exon microarray of the 
present invention lack vector sequence, and particularly 
lack sequences drawn from plasmids and bacteriophage. 

10 Preferably, at least about 85, 90 or more than 90% of exon- 
including probes in the genome-derived single exon 
microarray of the present invention lack vector sequence. 
With attention to removal of vector sequences through • 
preprocessing 24, percentages of vector-free exon-including 

15 probes can be as high as 95 - 99%. The substantial absence 
... of vector sequence from the genome-derived single exon 
microarrays of the present invention results in greater 
specificity during hybridization, since spurious cross- 
hybridization to a probe vector sequence is reduced. 

20 As a further consequence of excision or 

amplification of probes from vectors in construction of EST 
microarrays, the probes arrayed thereon often contain 
artificial sequence, derived from vector polylinker 
multiple cloning sites, at both 5' and 3' ends. The probes 

25 disposed upon the genome-derived single exon microarrays 
need have no such artificial sequence appended thereto. 

As mentioned above, however, the ORF-specific 
primers used to amplify putative ORFs can include 
artificial sequences, typically 5' to the ORF-specific 

30 primer sequence, useful for "universal" (that is, 
independent of ORF sequence) priming of subsequent 
amplification or sequencing reactions. When such 
"universal" 5' and/or 3 ? priming sequences are appended to 
the amplification primers, the probes disposed upon the 

35 genome-derived single exon microarray will include 
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artificial sequence similar to that found in EST 
microarrays . However, the genome -de rived single exon 
microarray of the present invention can be made without 
such sequences, and if so constructed, presents an even 
5 smaller amount of nonspecific sequence that would 
contribute to nonspecific hybridization. 

Yet another consequence of typical use of cloned 
material as probes in EST microarrays is that such 
microarrays contain probes that result from cloning 

10 artifacts, such as chimeric molecules containing coding 
region of two separate genes. Derived from genomic 
material, typically not thereafter cloned, the probes of 
the genome-derived single exon microarrays of the present 
invention lack such cloning artifacts, and thus provide 

15 greater specificity of signal in gene expression 
measurements . 

A further consequence of the cloned origin of 
probes on many EST microarrays is that the individual 
probes often have disparate sizes, which can cause the 

20 optimal hybridization stringency to vary among probes on a 
single microarray. In contrast, as discussed above, the 
probes arrayed on the genome-derived single exon 
microarrays of the present invention can readily be 
designed to have a narrow distribution in sizes, with the 

25 range of probe sizes no greater than about 10% of the 
average size, typically no greater than about 5% of the 
average probe size. 

Because of their origin from fully- or partially- 
spliced message, probes disposed upon EST arrays will often 

30 include multiple exons. The percentage of such exon- 

spanning probes in an EST microarray can be calculated, on 
average, based upon the predicted number of exons/gene for 
the given species and the average length of the immobilized 
probes. For human genes, the near-complete sequence of 

35 human chromosome 22, Dunham et al. t Nature 402 (6761) : 489-95 
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( 1 999 )V predicts that human genes average 5.5 exons/gene. 
Even with probes of 200 - 500 bp, the vast majority of 
human EST microarray probes include more than one exon. 

In contrast, by virtue of their origin from 

5 algorithmically identified ORFs in genomic sequence, the 
probes in the genome-derived single exon microarrays of the 
present invention can consist of individual exons. Thus, 
in contrast to EST microarrays, at least about 50, 60, 70, 
75, 80, 85, 95 or 99% of probes deposited in the genome- 

10 derived microarray of the present invention consist of, or 
include, no more than one predicted ORF. 

This provides. the ability, not readily achieved 
using EST microarrays, to use the genome-derived single 
exon microarrays of the present invention to measure 

15 tissue-specific expression of individual exons, which in 
turn allows differential splicing events to be detected and 
characterized, and in particular, allows the correlation of 
differential splicing to tissue-specific expression 
patterns . 

20 Furthermore, the exons that are represented in 

EST microarrays are often biased toward the 3' or 5' end of 
their respective genes, since sequencing strategies used 
for EST identification are so biased. In contrast, no such 
3' or 5' bias necessarily inheres in the selection of exons 

25 for disposition on the genome-derived single exon 
microarrays of the present invention. 

Conversely, the probes provided on the genome- 
derived single exon microarrays of the present invention 
typically, but need not necessarily, include intronic 

30 and/or intergenic sequence that is absent from EST 
microarrays, which are derived from mature mRN7A. 
Typically, at least about 50, 60, 70, 80 or 90% of the 
exon-including probes on the genome-derived single exon 
microarrays of the present invention include sequence drawn 

35 from noncoding regions. As discussed above, the additional 
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presence of noncoding region does not significantly 
interfere with measurement of gene' expression, and provides 
the additional opportunity to assay prespliced RNA, and - 
thus measure such phenomena such as nuclear export control. 
5 The genome-derived single exon microarrays of the 

present invention are also quite different from in situ 
synthesis microarrays, where probe size is severely 
constrained by inadequacies in the photolithographic 
synthesis process. 

10 Typically, probes arrayed on in situ synthesis 

microarrays are limited to a maximum of about 25 bp. As a 
well known consequence, hybridization to such chips must be 
performed at low stringency. In order, therefore, to 
achieve unambiguous sequence-specific hybridization 

15 results, the in situ synthesis microarray requires 
substantial redundancy, with concomitant programmed 
arraying for each probe of probe analogues with altered 
(i.e., mismatched) sequence. 

In contrast, the longer probe length of the 

20 genome-derived single exon microarrays of the present 

invention allows much higher stringency hybridization and., 
wash. Typically, therefore, exon-including probes on the 
genome-derived single exon microarrays of the present 
invention average at least about 100, 200, 300, 400 or 

25 500 bp in length. By obviating the need for substantial 

probe redundancy, this approach permits a higher density of 
probes for discrete exons or genes to be arrayed on the 
microarrays of the present invention than can be achieved 
for in situ synthesis microarrays. 

30 A further distinction is that the probes in in 

situ synthesis microarrays typically are covalently linked 
to the substrate surface. In contrast, the probes disposed 
on the genome-derived microarray of the present invention 
typically are, but need not necessarily be, bound 

35 noncovalently to the substrate. 
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Furthermore, the short probe size on in situ 
microarrays causes large percentage differences in the 
melting temperature of probes hybridized to their 
complementary target sequence, and thus causes large 
5 percentage differences in the theoretically optimum 
stringency across the array as a whole. 

In contrast, the larger probe size in the 
microarrays of the present invention create lower 
percentage differences in melting temperature across the 
10 range of arrayed probes. 

A further significant advantage of the 
microarrays of the present invention over in situ. 
synthesized arrays is that the quality of each individual 
probe can be confirmed before deposition. In contrast, the 
15 quality of probes cannot be assessed on a probe-by-probe 
basis for the in situ synthesized microarrays presently 
being used. 

The genome-derived single exon microarrays of the 
present invention are also distinguished over, and present 

20 substantial benefits over, the genome-derived microarrays 
from lower eukaryotes such as yeast. Lashkari et al., 
Proc. Natl. Acad. Sci. USA 94:13057-13062 (1997). 

Only about 220 - 250 of the 6100 or so nuclear 
genes in Saccharomyces cerdvisiae — that is, only about 4 

25 - 5% — have standard, spliceosomal, introns, Lopez et al., 
Nucl. Acids Res. 28:85-86 (2000); Spingola et al. t RNA 
5(2):221-34 (1999). Furthermore, the entire yeast genome 
has already been sequenced. These two facts permit the 
ready amplification and disposition of single-ORF amplicons 

30 on such microarray without the requirement for antecedent 
use of gene prediction and/or comparative sequence 
analyses . 

Thus, a significant aspect of the present 
invention is the ability to identify and to confirm 
35 expression of predicted coding regions in genomic sequence 
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drawn from eukaryotic organisms that have a higher 
percentage of genes having introns than do yeast such as 
Saccharomyces cerevisiae, particularly in genomic sequence 
drawn from eukaryotes in which at least about 10, 20 or 50% 
5 of protein-encoding genes have introns. In preferred 
embodiments, the methods and apparatus of the present 
invention are used to identify and confirm expression of 
novel genes from genomic sequence of eukaryotes in which 
the average number of introns per gene is at least about 
10 one, two or three or more. 

After the physical substrate is prepared, 
experimental verification of predicted function is 
performed. 

In a preferred embodiment of the present 

15 invention, where the function sought to be identified in 
genomic sequence is protein codings-experimental 
verification is performed by measuring expression, of the 
putative ORFs, typically through nucleic acid hybridization 
experiments, and in particularly preferred embodiments, 

20 through hybridization to genome-derived single exon 
microarrays prepared as above- described. 

Expression is conveniently measured and expressed 
for each probe in the microarray as a ratio of the 
expression measured concurrently in a plurality of mRNA 

25 sources, according to techniques well known in the 

microarray art, Reviewed in Schena et al., and as further 
described in Example 2, below. The mRNA source for the 
reference against which specific expression is measured can 
be drawn from a homogeneous mRNA source, such as a single 

30 cultured cell-type, or alternatively can be heterogeneous, 
as from a pool of mRNA derived from multiple tissues and/or 
cell types, as further described in Example 2, infra. 

mRNA can be prepared by standard techniques, see 
Ausubel et al. and Maniatis et al., or purchased 

35 commercially. The mRNA is then typically reverse- 
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transcribed in the presence of labeled nucleotides: the 
index source (that in which expression is desired to be 
measured) is reverse transcribed in the presence of 
nucleotides labeled with a first label, typically a 
5 fluorophore ( f luorochrome ; fluor; fluorescent dye) ; the 
reference source is reverse transcribed in the presence of 
a second label, typically a fluorophore, typically 
f luorometrically-distinguishable from the first label. As 
further described in Example 2, infra, Cy3 and Cy5 dyes 

10 prove particularly useful in these methods. After partial 
purification of the index and reference targets, 
hybridization to the probe array is conducted according to 
standard techniques, typically under a coverslip. 

After wash, microarrays are conveniently scanned 

15 using a commercial microarray scanning device, such as a 
Gen3 Scanner (Molecular Dynamics, Sunnyvale, CA) . Data on 
expression is then passed, with or without interim storage, 
to process 500, where the results for each probe are 
related to the original sequence. 

20 Often, hybridization of target material to the 

genome-derived single exon microarray will identify certain 
of the probes thereon as of particular interest. Thus, it 
is often desirable that the user be able readily to obtain 
sufficient quantities of an individual probe, either for 

25 subsequent arrayed deposition upon an additional support 

substrate, often as part of a microarray having a plurality 
of probes so identified, or alternatively or additionally 
as a solitary solid-phase or solution-phase probe, for 
further use. 

30 Thus, in another aspect, the present invention 

provides compositions and kits for the ready production of 
nucleic acids identical in sequence to, or substantially 
identical in sequence to, probes on the genome-derived 
single exon microarrays of the present invention. 

35 In this aspect, a small quantity of each probe is 
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disposed, typically without attachment to substrate, in a 
spatially-addressable ordered set, typically one per well 
of a microtiter dish. Although a 96 well microtiter plate 
can be used, greater efficiency is obtained using higher 
5 density arrays, such as are provided by microtiter plates 
having 384, 864, 1536, 3456, 6144, or 9600 wells, and 
although microtiter plates having physical depressions 
(wells) are conveniently used, any device that permits 
addressable withdrawal of reagent from fluidly- 

10 noncommunicating areas can be used. 

In this aspect of the invention, therefore, a 
fluidly noncommunicating addressable ordered set of 
individual probes, corresponding to those on a genome- 
derived single exon microarray, is provided, with each 

15 probe in sufficient quantity to permit amplification, such 
as by PCR. As earlier mentioned, the ORF-specific 
5' primers used for genomic amplification can have a first 
common sequence added thereto, and the ORF-specific 3' 
primers used for genomic amplification can have a second, 

20 different, common sequence added thereto, thus permitting, 
in this preferred embodiment, the use of a single set of 5' 
and 3' primers to amplify any one of the probes from the 
amplifiable ordered set. 

Each discrete amplifiable probe can also be 

25 packaged with amplification primers, solutes, buffers, 

etc., and can be provided in dry (e.g., lyophilized) form 
or wet, in the latter case typically with addition of 
agents that retard evaporation. 

In another aspect of the present invention, a 

30 genome-derived single-exon microarray is packaged together 
with such an ordered set of amplifiable probes 
corresponding to the probes, or one or more subsets of 
probes, thereon. In alternative embodiments, the ordered 
set of amplifiable probes is packaged separately from the 

35 genome-derived single exon microarray. 
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In some embodiments, the microarray and/or 
ordered probe set are further packaged with recordable 
media that provide probe identification and addressing 
information, and that can additionally contain annotation 
5 information, such as gene expression data. Such recordable 
media can be packaged with the microarray, with the ordered 
probe set, or with both. 

If the microarray is constructed on a substrate 
that incorporates recordable media, such as is described in 
10 international patent application no. WO 98/12559, then 
separate packaging of the genome-derived single exon 
microarray and the bioinf ormatic information is not 
required. 

The amount of amplifiable probe material should 
15 be sufficient to permit at least one amplification 
sufficient for subsequent hybridization assay. 

Although the use of high density genome-derived 
microarrays on solid planar substrates is presently a 
preferred approach for the physical confirmation and 
20 characterization of the expression of sequences predicted 
to encode, v protein, other types of microarrays (as herein 
defined) can also be used. 

Furthermore, as earlier mentioned, experimental 
verification of the function predicted from genomic 
25 sequence in process 200 can be bioinf ormatic, rather than, 
or additional to, physical verification. 

For example, where the function desired to be 
identified is protein coding, the predicted ORFs can be 
compared bioinformatically to sequences known or suspected 
30 of being expressed. 

Thus, the sequences output from process 300 (or 
process 200) , can be used to query expression databases, 
such as EST databases, SNP ("single nucleotide 
polymorphism") databases, known cDNA and mRNA sequences, 
35 SAGE ("serial analysis of gene expression") databases, and 
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more generalized sequence databases that allow query for 
expressed sequences. Such query can be done by any 
sequence query algorithm, such as BLAST ("basic local 
alignment search tool") . The results of such query - 
5 including information on identical sequences and 

information on nonidentical sequences that have diffuse or 
focal regions of sequence homology to the query sequence - 
can then be passed directly to process 500, or used to 
inform analyses subsequently undertaken in process 200, 

10 process 300, or process 400. 

Experimental data, whether obtained by physical 
or bioinf ormatic assay in process 400, is passed to process 
500 where it is usefully related to the sequence data . 
itself, a process colloquially termed "annotation". Such 

15 annotation can be done using any technique that usefully 
relates the functional information to the sequence, as, for 
example, by incorporating the functional data into the 
record itself, by linking records in a hierarchical or 
. relational database, by linking to external databases, or 

20 by a combination thereof. Such database techniques are 
well within the skill in the art. 

The annotated sequence data can be stored 
locally, uploaded to genomic sequence database 100, and/or 
displayed 800. 

25 The methods and apparatus of the present 

invention rapidly produce functional information from 
genomic sequence. Coupled with the escalating pace at 
which sequence now accumulates, the rapid pace of sequence 
annotation produces a need for methods of displaying the 

30 information in meaningful ways. 

FIG. 3 shows visual display 80 presenting a 
single genomic sequence annotated according to the present 
invention. Because of its nominal resemblance to artistic 
works of Piet Mondrian, visual display 80 is alternatively 

35 described herein as a "Mondrian". 
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Each of the visual elements of display 80 is 
aligned with respect to the genomic sequence being 
annotated {hereinafter, the "annotated sequence") . Given 
the number of nucleotides typically represented in an 
5 annotated sequence, representation of individual 

nucleotides would rarely be readable in hard copy output of 
display 80. Typically, therefore, the annotated sequence 
is schematized as rectangle 8 9, extending from the left 
border of display 80 to its right border. By convention 
10 herein, the left border of rectangle 89 represents the 
first nucleotide of the sequence and the right border of 
rectangle 89 represents the last nucleotide of the 
sequence . 

As further discussed below, however, the Mondrian 

15 visual display of annotated sequence can serve a-s a 
convenient graphical user interface for computerized 
representation, analysis, and query of information stored 
electronically. For such use, the individual nucleotides 
can conveniently be linked to the X axis coordinate of 

20 rectangle 89. This permits the annotated sequence at any 
point within rectangle 89 readily to be viewed, either 
automatically - for example, by time-delayed appearance of 
a small overlaid window upon movement of a cursor or other 
pointer over rectangle 89 — or through user intervention, 

25 as by clicking a mouse or other pointing device at a point 
in rectangle 89. 

Visual display 80 is generated after user 
specification of the genomic sequence to be displayed. 
Such specification can consist of or include an accession 

30 number for a single clone {e.g., a single BAG accessioned 
into GenBank) , wherein the starting and stopping 
nucleotides are thus absolutely identified, or 
alternatively can consist of or include an anchor or 
fulcrum point about which a chosen range of sequence is 

35 anchored, thus providing relative endpoints for the 
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sequence to be displayed. For example, the user can anchor 
such a range about a given chromosomal map location, gene 
name, or even a sequence returned by query for similarity 
or identity to an input query sequence. When visual 
5 display 80 is used as a graphical user interface to 

computerized data, additional control over the first and 
last displayed nucleotide will typically be dynamically 
selectable, as by use of standard zooming and/or selection 
tools. 

10 Field 81 of visual display 80 is used to present 

the output from process 200, that is, to present the 
bioinf ormatic prediction of those sequences having the 
desired function within the genomic sequence. Functional 
sequences are typically indicated by at least one rectangle 

15 83 (83a, 83b, 83c) , the left and right borders of which 
respectively indicate, by their X-axis coordinates, the 
starting and ending nucleotides of the region predicted to 
have function. 

Where a single bioinformatic method or approach 

20 identifies a plurality of regions having the desired 
function, a plurality of rectangles 83 is disposed 
horizontally in field 81. Where multiple methods and/or 
approaches are used to identify function, each such method 
and/or approach can be represented by its own series of 

25 horizontally disposed rectangles 83, each such horizontally 
disposed series of rectangles offset vertically from those 
representing the results of the other methods' and 
approaches . 

Thus, rectangles 83a in FIG. 3 represent the 
30 functional predictions of a first method of a first 

approach for predicting function, rectangles 83b represent 
the functional predictions of a second method and/or second 
approach for predicting that function, and rectangles 83c 
represent the predictions of a third method and/or 
35 approach. 
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Where the function desired to be identified is 
protein coding, field 81 is used to present the 
bioinf ormatic prediction of sequences encoding protein. 
For example, rectangles 83a can represent the results from 

5 GRAIL or GRAIL II, rectangles 83b can represent the results 
from GENEFINDER, and rectangles 83c can represent the 
results from DICTION. 

Optionally, and preferably, rectangles 83 
collectively representing predictions of a single method 

10 and/or approach are identically colored and/or textured, 
and are distinguishable from the color and/or texture used 
for a different method and/or approach. 

Alternatively, or in addition, the color, hue, 
density, or texture of rectangles 83 can be used further to 

15 report a measure of the bioinf ormatic reliability of the 
prediction. For example, many gene prediction programs 
will report a measure of the reliability of prediction. 
Thus, increasing degrees of such reliability can be 
indicated, e.g., by increasing density of shading. Where 

20 display 80 is used as a graphical user interface, such 
measures of reliability, and indeed all other results 
output by the program, can additionally or alternatively be 
made accessible through linkage from individual rectangles 
83, as by time-delayed window ("tool tip" window), or by 

25 pointer (e.g., mouse) -activated link. 

As earlier described, increased predictive 
reliability can be achieved by requiring consensus among 
methods and/or approaches to determining function. Thus, 
field 81 can include a horizontal series of rectangles 83 

30 that indicate one or more degrees of consensus in 
predictions of function. 

Although FIG. 3 shows three series of 
horizontally disposed rectangles in field 81, display 80 
can include as few as one such series of rectangles and as 

35 many as can discriminably be displayed, depending upon the 
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number of methods and/or approaches used to predict a given 
function . 

Furthermore, field 81 can be used to show 
predictions of a plurality of different functions. 
5 However, the increased visual complexity occasioned by such 
display makes more useful the ability of the user to select 
a single function for display. When display 80 is used as 
a graphical user interface for computer query and analysis, 
such function can usefully be indicated and user- 

10 selectable, as by a series of graphical buttons or tabs 
(not shown in FIG. 3) . 

Rectangle 89 is shown in FIG. 3 as including 
interposed rectangle 84. Rectangle 84 represents the 
portion of annotated sequence for which predicted 

15 functional information has been assayed physically, with 
the starting and ending nucleotides of the assayed material 
indicated by the X axis coordinates of the left and right 
borders of rectangle 84. Rectangle 85, with optional 
inclusive circles 86 (86a, 86b, and 86c) displays the 

20 results of such physical assay. 

Although a single rectangle 84 is shown in FIG. 
3, physical assay is not limited to just one region of 
annotated genomic sequence. It is expected that an 
increasing percentage of regions predicted to have function 

25 by process 200 will be assayed physically, and that display 
80 will accordingly, for any given genomic sequence, have 
an increasing number of rectangles 84 and 85, representing 
an increased density of sequence annotation. 

Where the function desired to be identified is 

30 protein coding, rectangle 84 identifies the sequence of the 
probe used to measure expression. In embodiments of the 
present invention where expression is measured using 
genome-derived single exon microarrays, rectangle 84 
identifies the sequence included within the probe 

35 immobilized on the support surface of the microarray. As 
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noted supra, such probe will often include a small amount 
of additional, synthetic, material incorporated during 
amplification and designed to permit reamplif ication of the 
probe, which sequence is typically not shown in display 80. 
5 Rectangle 87 is used to present the results of 

bioinformatic assay of the genomic sequence. For example, 
where the function desired to be identified is protein 
coding, process 400 can include bioinformatic query of 
expression databases with the sequences predicted in 

10 process 200 to encode exons. And as earlier discussed, 

because bioinformatic assay presents fewer constraints than 
does physical assay, often the entire output of process 200 
can be used for such assay, without further subsetting 
thereof by process 300. Therefore, rectangle 87 typically 

15 need not have separate indicators therein of regions 

submitted for bioinformatic assay; that is, rectangle 87 
typically need not have regions therein analogous to 
rectangles 84 within rectangle 89. 

Rectangle 87 as shown in FIG. 3 includes smaller 
.20 rectangles 880 and 88. Rectangles 880 indicate regions 

that returned a positive result in the bioinformatic assay, 
with rectangles 88 representing regions that did not return 
such positive results. Where the function desired to be 
predicted and displayed is protein coding, rectangles 880 

25 indicate regions of the predicted exons that identify 
sequence with significant similarity in expression 
databases, such as EST, SNP, SAGE databases, with 
rectangles 88 indicating genes novel over those identified 
in existing expression data bases. 

30 Rectangles 880 can further indicate, through 

color, shading, texture, or the like, additional 
information obtained from bioinformatic assay. 

For example, where the function assayed and 
displayed is protein coding, the degree of shading of 

35 rectangles 880 can be used to represent the degree of 
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sequence similarity found upon query of expression 
databases. The number of levels of discrimination can be 
as few as two {identity, and similarity, where similarity 
has a user-selectable lower threshold) . Alternatively, as 
5 many different levels of discrimination can be indicated as 
can visually be discriminated. 

Where display 80 is used as a graphical user 
interface, rectangles 880 can additionally provide links 
directly to the sequences identified by the query of 

10 expression databases, and/or statistical summaries thereof. 
As with each of the precedingly-discussed uses of display 
80 as a graphical user interface, it should be understood 
that the information accessed via display 80 need not be 
resident on the computer presenting such display, which 

15 often will be serving as a client, with the linked 
information resident on one or more remotely located 
servers . 

Rectangle 85 displays the results of physical 
assay of the sequence delimited by its left and right 
20 borders. 

Rectangle 85 can consist of a single rectangle, 
thus indicating a single assay, or alternatively, and 
increasingly typically, will consist of a series of 
rectangles (85a, 85b, 85c) indicating separate physical 

25 assays of the same sequence. 

Where the function assayed is gene expression, 
and where gene expression is assayed as herein described 
using simultaneous two-color fluorescent detection of 
hybridization to genome-derived single exon microarrays, 

30 individual rectangles 85 can be colored to indicate the 
degree of expression relative to control. Conveniently, 
shades of green can be used to depict expression in the 
sample over control values, and shades of red used to 
depict expression less than control, corresponding to the 

35 spectra of the Cy3 and Cy5 dyes conventionally used for 
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respective labeling thereof. Additional functional 
information can be provided in the form of circles 8 6 (86a, 
86b, 86c) , where the diameter of the circle can be used to 
indicate expression intensity. As discussed infra, such 
5 relative expression (expression ratios) and absolute 
expression (signal intensity) can be expressed using 
normalized values. 

Where display 80 is used as a graphical user 
interface, rectangle 85 can be used as a link to further 

10 information about the assay. For example, where the assay 
is one for gene expression, each rectangle 85 can be used 
to link to information about the source of the hybridized 
mRNA, the identity of the control, raw or processed -data 
from the microarray scan, or the like. 

15 FIG. 4 is rendition of display 80 representing 

gene prediction and gene expression for a hypothetical BAC, 
showing conventions used in the Examples presented infra. 
BAC sequence ("Chip seq.") 89 is presented, with the 
physically assayed region thereof (corresponding to 

20 rectangle 84 in FIG. 3) shown in white. Algorithmic gene 
predictions are shown in field 81, with predictions by 
GRAIL shown, predictions by GENEFINDER, and predictions, by 
DICTION shown. Within rectangle 87, regions of sequence 
that, when used to query expression databases, return 

25 identical or similar sequences ( "EST hit") are shown as 
white rectangles (corresponding to rectangles 880 in FIG. 
3) , gray indicates low homology, and black indicates 
unknowns (where black and gray would correspond to 
rectangles 88 in FIG. 3) . 

30 Although FIGS. 3 and 4 show a single stretch of 

sequence, uninterrupted from left to right, longer 
sequences are usefully represented by vertical stacking of 
such individual Mondrians, as shown in FIGS. 9 and 10. 
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The methods and apparatus of the present 
invention rapidly produce functional information from 
genomic sequence. Where the function to be identified is 
5 protein coding, the methods and apparatus of the present 
invention rapidly identify and confirm the expression of 
portions of genomic sequence that function to encode 
protein. As a direct result, the methods and apparatus of 
the present invention rapidly yield large numbers of 

10 single-exon nucleic acid probes, the majority from 
previously unknown genes, each of which is useful for 
measuring and/or surveying expression of a specific gene in 
one or more tissues or cell types. 

It is, therefore, another aspect of the present 

15 invention to provide genome-derived single exon nucleic 
acid probes useful for gene expression analysis, and 
particularly for gene expression analysis by microarray. 

Using the methods and genome-derived single-exon 
microarrays of the present invention, we have for example 

20 readily identified a large number of unique ORFs from human 
genomic sequence. Using single exon probes that encompass 
these ORFs, we have demonstrated, through microarray 
hybridization analysis, the expression of 13,114 of these 
ORFs in bone marrow. 

25 As would immediately be appreciated by one of 

skill in the art, each single exon probe having 
demonstrable expression in bone marrow is currently 
available for use in measuring the level of its ORF's 
expression in bone marrow. 

30 Because bone marrow is the tissue in which blood 

cells originate, diseases of the bone marrow are a 
significant cause of human morbidity and mortality. 
Increasingly, genetic factors are being found that 
contribute to predisposition, onset, and/or aggressiveness 

35 of most, if not all, of these diseases. Although mutations 
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in single genes have in some cases been identified as 
causal - notably in the thalassemias and sickle cell anemia 
- disorders of the bone marrow are, for the most part, 
believed to have polygenic etiologies. 
5 For example, cancers that originate in the bone 

marrow and lymphatic tissues such as the lymphomas, 
leukemias, and myeloma have been recognized as a major 
health concern. An estimated 632,000 Americans are 
presently living with lymphoma, leukemia or myeloma, and 

10 over 110,000 new cases are anticipated each year. The new 
cases alone account for 11% of all cancer cases reported in 
the United States. 

Lymphoma is a general term for a group of cancers 
of lymphocytes that manifest in the tissues of the 

15 lymphatic system. Eventually, monoclonal proliferation 
crowds out healthy cells and creates tumors which enlarge 
lymph nodes. Approximately 4 50,000 members of the U.S. 
population are living with lymphoma: 160,000 with Hodgkin 
disease (HD) and 290,000 with non-Hodgkin lymphoma. 

20 Hodgkin disease (HD) is a specialized form. of 

lymphoma, and represent about 8% of all lymphomas. HD can 
be distinguish in tissues by the presence of an abnormal 
cell called the Reed-Sternberg cell. Incidence rates of HD 
are higher in adolescents and young adults, but HD is 

25 considered to be one of the most curable forms of cancer. 
Symptoms of HD include painless welling of lymph glands, 
fatigue, recurrent high fever, sweating at night, skin 
irritations and loss of weight. 

Although an infectious etiology has been proposed 

30 to account for the disproportionate incidence of HD among 
siblings reared together - particularly an association with 
Epstein Barr Virus (EBV) - multiple genetic contributions 
have also been suggested. 

As early as 1986, linkage to HLA was suggested, 

35 with Klitz et al., Am. J. Hum. Genet. 54: 497-505 (1994) 
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reporting an overall association of the nodular sclerosing 
(NSHD) group with the HLA class II region. Results of the 
study suggested that susceptibility to NSHD is influenced 
by more than 1 locus within the class II region. Through a 
5 literature search, Shugart and Collins (2000), Europ. J. 
Hum. Genet. 8: 460-463 (2000), performed a combined 
segregation and linkage analysis on 59 nuclear families 
with HD and concluded that HD is most likely determined by 
both an HLA-associated major gene and other non-HLA genetic 
10 factors, in conjunction with environmental effects. 

Non-Hodgkin lymphoma (NHL) is a malignant 
monoclonal proliferation of the lymphoid cells in the 
immune system, including bone marrow, spleen, liver and GI 
tract. The pathologic classification of NHL continues to 
15 evolve, reflecting new insights into the cells of origin 
and the biologic bases of these heterogeneous diseases. 
The course of NHL varies from indolent and "initially well' 
tolerated to rapidly fatal. Furthermore, common clinical 
symptoms of NHL, but rare in HD, are congestion and edema 
20 of the face and neck and ureteral compression. 

Non-Hodgkin lymphoma (NHL) has been linked to a 
variety of specific genetic defects, including 26 mutated 
genes and at least 9 identified chromosomal translocations. 
Among the mutated genes are: ALK (2p23) ; API2 (MIHC, cIAP2) 
25 (Ilq22-q23); API4 (survivin, SW) (17q25 (?) ) ; ATM (ATA, ATC) 
(llq22.3); BCL1 (llql3.3); BCL10 (CLAP, CIPER) (lp22) ; BCL2 
(18q21.3); BCL6 (LAZ3,ZNF51) (3q27); BLYM (lp32) ; BMI1 
(10pl3); CCND1 (D11S287E, Cyclin D, PRADl ) ( llql3) ; CD44 
(MDU3, HA, MDU2) (llpter-pl3) ; FRAT1 (10q23-q24 (?) ) ; FRAT2 
30 (GBP) (10 (?) ) ; IL6 (IFNB2) (7p21) ; IRF4 (MUM1, LSIRF) (6p25- 
p23); LCP1 (PLS2) ( 13ql4 . l-ql4 . 3) ; MALT1 (MLT) { 18q21 ) ; MUCl 
(PUM,PEM) (lq21) ; MYBL1 (AMYB, A-MYB) (8q22) ; MYC (CMYC, C- 
MYC) (8q24 . 12-q24 . 13) ; NBSl(8q21); NPM1 (B23) (5q35) ; PCNA 
(20pl2); TIAM1 (21q22.1); and TP53 (p53, P53) (17ql3 . 1 ) . 
35 Among the chromosomal abnormalities are: t(l;14) 
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( P 22;q32); t (14; 18) (q32; q21) ; t (3; 14 ) (q27 ; q32 ) ; 
t(6;14) (p25,q32) ; t (11; 18) (q21;q21) ; t < 1; 14 ) (q21 ;q32) ; 
t (2;5) (p23;q35) ; add(14q32) / dup(14p32); and 
t(ll;14) (ql3;q32) . 
5 Additional genetic loci, as yet undiscovered, are 

believed to account for other occurrences of NHL. 

As another example, acute leukemia is a malignant 
disease of blood-forming tissues such as the bone marrow. 
It is characterized by the uncontrolled growth of white 

10 blood cells. As a result, immature myeloid cells (in acute 
myelogenous leukemia (AML) } or lymphoid cells (in acute 
lymphocytic leukemia (ALL) ) rapidly accumulate and 
progressively replace the bone marrow; diminished 
production of normal red cells, white cells, and platelets 

15 ensues. This loss of normal marrow function in turn gives 
rise to the typical clinical complications of leukemia: 
anemia, infection, and bleeding. 

If untreated, ALL is rapidly fatal; most patients 
die within several months of diagnosis. With appropriate 

20 therapy, many patients can be cured. The survival rate for 
patients diagnosed with AML or ALL is 14% and 58% 
respectively. However, the incidences of AML is expected 
to be greater than ALL: an estimated 10,000 new cases of 
AML, predominantly in older adults, is anticipated in the 

25 U.S. alone, whereas 3,100 new cases of ALL are expected, 
with 1,500 of these new cases occurring among children. 

The etiology of acute leukemia is not known. 
. Although human T-cell lymphotropic virus type I (HTLV-I) , a 
causative agent of adult T-cell leukemia, and HTLV-II, 

30 obtained from several patients with a syndrome resembling 
hairy cell leukemia, have been isolated, the etiologic link 
between HTLV and malignancy is uncertain. There is, 
however, evidence which suggests a genetic predisposition 
to incidences of acute leukemia. 

35 For example, genetic disorders such as Fanconi 
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anemia and Down syndrome appear to increase risk of acute 
leukemia, specifically, AML. Evidence supporting a 
chromosome 21 locus for acute myelogenous leukemia (AML) 
includes the finding of linkage to 21q22 . l-q22 . 2 in a 
5 family with a platelet disorder and propensity to develop 
AML (Ho et al.., Blood 87: 5218-5224 (1996), an increased 
incidence of leukemia in Down syndrome, and frequent 
somatic translocation in leukemia involving the CBFA gene 
on 21q22.3. In addition, Horwitz et al., Am. J. Hum. 

10 Genet. 61:873-881 (1997), suggest that a gene on 16q22 may 
be a second cause of acute myelogenous leukemia. 
Nonparametric linkage analysis gave a P-value of 0.00098 
for the conditional probability of linkage. Mutational 
analysis excluded expansion of the AT-rich minisatellite 

15 repeat FRA16B fragile site and the CAG trinucleotide repeat 
in the E2F-4 transcription factor. Large CAG repeat 
expansion was excluded as a cause of leukemia in this 
family. 

Similarly, acute lymphoblastic leukemia (ALL) has 

20 been suggested to have a genetic predisposition. In 

particular, linkage to chromosome 9p has been reported by a 
number of groups. Chilcote et al., New Eng. J. Med. 313: 
286-291 (1985), found that 6 of 8 patients with clinical 
features of lymphomatous ALL (LALL) , a distinct category of 

25 ALL of T-cell lineage, had karyotypic abnormalities leading 
to loss of bands 9p22-p21. The mechanisms varied and 
included deletions, unbalanced translocations, and loss of 
the entire chromosome; only 1 of 57 patients without LALL 
had an abnormality of chromosome 9 at diagnosis. Kowalczyk 

30 et al., Cancer Genet. Cytogenet. 9:383-385 (1981), had 
earlier found changes in 9p in a subgroup of ALL cases. 
Chilcote et al. (1985) pointed out that there is a fragile 
site at 9p21 and raised the question of familial 
predisposition on this basis. This fragile site is the 

35 breakpoint in the translocation t (9; 11) (p21-22;q23) , which 

59 



I 



WO 01/57276 PCT/US01/00668 

is associated with acute n on lymphocytic leukemia 
with monocytic features, ANLL-AMoL-M5a . In a large series, 
Murphy et al., New Eng. J. Med. 313:1611 (1985), confirmed 
an abnormality of 9p in 10 to 11% of cases {33 out of more 

5 than 300) of acute lymphoblastic leukemia. The breakpoints 
in 9p clustered in the p22-p21 region. They could not, 
however, corroborate the specific association with T-cell 
origin or so-called lymphomatous clinical features. In 
addition, Taki et al., Proc. Natl. Acad. Sci. USA 96:14535 

10 (1999), recently identified AF5q31, a new AF4-related gene, 
fused to MLL in infant ALL with ins (5; 11) (q31 ; ql3q23 ) , and 
suspects that AF5q31 and AF4 might define a new family 
particularly involved in the pathogenesis of llq23- 
associated-ALL. 

15 As yet a further example of a disease affecting 

bone marrow with likely polygenic etiology is. multiple 
myeloma (MM) . 

MM is a cancer of plasma cells, the final 
differentiated stage of B lymphocyte maturation. The 

20 malignant clone proliferates in the bone marrow and 

frequently invades the adjacent bone, producing extensive 
skeletal destruction that results in bone pain and 
fractures. Anemia, hypercalcemia, and renal failure are 
some clinical manifestations associated with MM. 

25 MM causes 1% of all cancer deaths in Western 

countries. A genetic component to its etiology is 
suggested by disparate incidence among various groups in 
the country. Its incidence is higher in men than in women, 
in people of African descent relative to the U.S. 

30 population at large, and in older adults as compared to the 
young. It has been estimated that 14,000 new cases of 
myeloma will be diagnosed in the U.S., and over 11,000 
persons will die from MM within the year. 

Although, Kaposi's sarcoma-associated herpes 

35 virus has been associated with MM (Retig et al., Science 
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276:1851 (1997)), there is evidence that chromosomal 
abnormalities, such as the deletion of 13ql4 and 
rearrangements of 14q increase the proliferation of myeloma 
cells. 

5 Up to 30% of patients who suffer with MM have a 

balanced translocation, t (4; 14) (pl6. 3;q32) , that places the 
fibroblast growth factor receptor 3 (FGFR3) gene under the 
control of IgH promoter elements (Chesi et al . , Nat. Genet. 
16:260 (1997)). This results in increased expression of 

10 FGFR3, a member of a family of tyrosine kinase receptors 
implicated in control of cellular proliferation. 

According to Zoger et al., Blood 95:1925 (2000), 
monoallelic deletions of the retinoblastoma-1 (rb-1) gene 
and the D13S319 locus were observed in 48 of 104 patients 

15 (46.2%) and in 28 of 72 (38.9%) patients, respectively, 
with newly diagnosed MM. Fluorescence in situ 
hybridization (FISH) studies found that 13ql4 was deleted 
in all 17 patients with karyotypic evidence of monosomy 13 
or deletion of 13q but also in 9 of 19 patients with 

20 apparently normal karyotypes. Patients with a 13ql4 

deletion were more likely to have higher serum levels of 
beta (2) -microglobulin (P=0.059) and a higher percentage of 
bone marrow plasma cells (P=0.085) than patients with a 
normal 13ql4 status on FISH analysis. In patients with a 

25 deletion of 13ql4, myeloma cell proliferation was markedly 
increased. The presence of a 13ql4 deletion on FISH 
analysis was associated with a significantly lower rate of 
response to conventional-dose chemotherapy 
(40.8% compared with 78.6%; P =.009) and a shorter overall 

30 survival (24.2 months compared with > 60 months; P <.005) 
than in patients without the deletion. 

There are numerous other mutated genes and 
chromosomal abnormalities that may predispose to MM. 
Examples of such genes are: B2M (15q21-q22); CCNDl 

35 (D11S287E, Cyclin D, PRAD1) (llql3) ; CD19 (16pll.2 ); HGF 
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(HPTA) (7q21.1) ; IL6 (IFNB2) (7p21) ; IRF4 (MUM1, LSIRF)(6p25- 
p23); LTA (TNFB, LT)(6p21.3); SDC1 (2p24.1); and TNF (TNFA, 
TNFSF2 , DIF) { 6p21 . 3 ) . Examples of chromosomal 
abnormalities include: t ( 6; 14 ) (p25; q32) and 
5 t(ll;14) (ql3;q32) . 

Other significant diseases or disorders of the 
bone marrow are also believed, or likely to have, a 
genetic, typically polygenic, etiologic component. These 
diseases include, for example, chronic myeloid leukemia, 

10 chronic lymphoid leukemia, polycythemia vera, 

myelofibrosis, primary thrombocythemia, myelodysplastic 
syndromes, Wiskott-Aldrich, lymphoprolif erative syndrome, 
aplastic anemia, Fanconi anemia, Down syndrome, sickle cell 
disease, thalassemia, granulocyte disorders, Kostmann 

15 syndrome, chronic granulomatous disease, Chediak-Higashi 
syndrome, platelet disorders, Glanzmann thrombasthenia, 
Bernard-Soulier syndrome, metabolic storage diseases, 
osteoporosis, congenital hemophagocytic syndrome. 

The human genome-derived single exon nucleic acid 

20 probes and microarrays of the present invention are useful 
for predicting, diagnosincj, grading, staging, monitoring 
and prognosing diseases of human bone marrow, particularly 
those diseases with polygenic etiology. With each of the 
single exon probes described herein shown to be expressed 

25 at detectable levels in human bone marrow, and with about 
2/3 of the probes identifying novel genes, the single exon 
microarrays of the present invention provide exceptionally 
high informational content for such studies. 

For example, diagnosis, grading, and/or staging 

* 

30 of a disease can be based upon the quantitative relatedness 
of a patient gene expression profile to one or more 
reference expression profiles known to be characteristic of 
a given bone marrow disease, or to specific grades or 
stages thereof. 

35 In one embodiment, the patient gene expression 
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profile is generated by hybridizing nucleic acids obtained 
directly or indirectly from transcripts expressed in the 
patient's bone marrow {or cells cultured, therefrom) to the 
genome-derived single exon microarray of the present 
5 invention. Reference profiles are obtained similarly by 
hybridizing nucleic acids obtained directly or indirectly 
from transcripts expressed in the bone marrow of 
individuals with known disease. Methods for quantitatively 
relating gene expression profiles, without regard to the 
10 function of the protein encoded by the gene, are disclosed 
in WO 99/58720, incorporated herein by reference in its 
entirety. 

In another approach, the genome-derived single 
exon probes and microarrays of the present invention can be 

15 used to interrogate genomic DNA, rather than pools of 
expressed message; this latter approach permits 
predisposition to and/or prognosis of diseases of bone 
marrow to be assessed through the massively parallel 
determination of altered copy number, deletion, or mutation 

20 in the patient's genome of exons known to be expressed in 
human bone marrow. The algorithms set forth in WO 99/58720 
can be applied to such genomic profiles without regard to 
the function of the protein encoded by the interrogated 
gene . 

25 The utility is specific to the probe; at 

sufficiently high hybridization stringency, which 
stringencies are well known in the art — see Ausubel et al. 
and Maniatis et al. - each probe reports the level of 
expression of message specifically containing that ORF. 

30 It should be appreciated, however, that the 

probes of the present invention, for which expression in 
the bone marrow has been demonstrated are useful for both 
measurement in the bone marrow and for survey of expression 
in other tissues. 

35 Significant among such advantages is the presence 
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of probes for novel genes. 

As mentioned above and further detailed in 
Examples 1 and 2, the methods described enable ORFs which 
are not present in existing expression databases to be 

5 identified. And the fewer the number of tissues in which 
the ORF can be shown to be expressed, the more likely the 
ORF will prove to be part of a novel gene: as further 
discussed in Example 2, ORFs whose expression was 
measurable in only a single of the tested tissues were 

10 represented in existing expression databases at a rate of 
only 11%, whereas 36% of ORFs whose expression was 
measurable in 9 tissues were present in existing expression 
databases, and fully 45% of those ORFs expressed in all ten 
tested tissues were present in existing expressed sequence 

15 databases. 

Either as tools for measuring gene expression or 
tools for surveying gene expression, the genome-derived 
single exon probes of the present invention have 
significant advantages over the cDNA or EST-based probes 

20 that are currently available for achieving these utilities. 

The genome-derived single exon probes of the 
present invention are useful in constructing genome-derived 
single exon microarrays; the genome-derived single exon 
microarrays, in turn, are useful devices for measuring and 

25 for surveying gene expression in the human. 

Gene expression analysis using microarrays — 
conventionally using microarrays having probes derived from 
expressed message — is well-established as useful in the 
biological research arts (see Lockhart et al. Nature 405, 

30 827-836) . 

Microarrays have been used to determine gene 
expression profiles in cells in response to drug treatment 
(see, for example, Kaminski et al. r "Global Analysis of 
Gene Expression in Pulmonary Fibrosis Reveals Distinct 
35 Programs Regulating Lung Inflammation and Fibrosis," Proc. 
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Natl. Acad. Sci . USA 97 ( 4 ): 1778-83 (2000); Bartosiewicz et 
al. f "Development of a Toxicological Gene Array and 
Quantitative Assessment of This Technology, " Arch . Biochem. 
Biophys. 376 (1) : 66-73 (2000)), viral infection (see for 
5 example, Geiss et al., "Large-scale Monitoring of Host Cell 
Gene Expression During HIV-1 Infection Using cDNA 
Microarrays, 11 Virology 266(1): 8-16 (2000)) and during cell 
processes such as differentiation-, senescence and apoptosis 
(see, for example, Shelton et al . , "Microarray Analysis of 

10 Replicative Senescence," Curr. Biol. 9(17):939-45 (1999); 
Voehringer et al. r "Gene Microarray Identification of Redox 
and Mitochondrial Elements That Control Resistance or 
Sensitivity to Apoptosis," Proc. Natl. Acad. Sci. USA 
97 (6) :2680-5 (2000) ) . 

15 Microarrays have also been used to determine 

abnormal gene expression in diseased tissues (see, for 
example, Alon et al., "Broad Patterns of Gene Expression 
Revealed by Clustering Analysis of Tumor and Normal Colon 
Tissues Probed by Oligonucleotide Arrays," Proc. Natl. 

20 Acad. Sci. USA 96 ( 12 ): 6745-50 (1999); Perou et al., 

"Distinctive Gene Expression Patterns in Human Mammary 
Epithelial Cells and Breast Cancers, Proc. Natl. Acad. Sci. 
USA 96(16) : 9212-7 (1999); Wang et al., "Identification of 
Genes Differentially Over-expressed in Lung Squamous Cell 

25 Carcinoma Using Combination of cDNA Subtraction and 
Microarray Analysis," Oncogene 19 (12) : 1519-28 (2000); 
Whitney et al., "Analysis of Gene Expression in Multiple 
Sclerosis Lesions Using cDNA Microarrays," Ann. Neurol. 
46(3): 425-8 (1999)), in drug discovery screens (see, for 

30 example, Scherf et al., "A Gene Expression Database for the 
Molecular Pharmacology of Cancer," Nat. Genet. 24(3):236-44 
(2000)) and in diagnosis to determine appropriate treatment 
strategies (see, for example, Sgroi et al., "In vivo Gene 
Expression Profile Analysis of Human Breast Cancer 
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Progression," Cancer Res. 59 (22) : 5656-61 (1999)). 

In microarray-based gene expression screens of 
pharmacological drug candidates upon cells, each probe 
provides specific useful data. In particular, it should be 

5 appreciated that even those probes that show no change in 
expression are as informative as those that do change, 
serving, in essence, as negative controls. 

For example, where gene expression analysis is 
used to assess toxicity of chemical agents on cells, the 

10 failure of the agent to change a gene's expression- level is 
evidence that the drug likely does not affect the pathway 
of which the gene's expressed protein is a part. 
Analogously, where gene expression analysis is used to 
assess side effects of pharmacological agents — whether in 

15 lead compound discovery or in subsequent screening of lead 
compound derivatives — the inability of the agent to alter 
a gene's expression level is evidence that the drug does 
not affect the pathway of which the gene's expressed 
protein is a part. 

20 WO 99/58720 provides methods for quantifying the 

relatedness of a first and second gene expression profile 
and for ordering the relatedness of a plurality of gene 
expression profiles. The methods so described permit 
useful information to be extracted from a greater 

25 percentage of the individual gene expression measurements 
from a microarray than methods previously used in the art. 

Other uses of microarrays are described in 
Gerhold et al. , Trends Biochem. Sci. 24 (5) : 168-173 (1999) 
and Zweiger, Trends Biotechnol. 17 (11) : 429-436 (1999); 

30 Schena et al. 

The invention particularly provides genome- 
derived single-exon probes known to be expressed in bone 
marrow. The individual single exon probes can be 

provided in the form of substantially isolated and purified 

35 nucleic acid, typically, but not necessarily, in a quantity 
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sufficient to perform a hybridization reaction. 

- ... Such nucleic acid can be in any form directly 
hybridizable to the message that contains the probe's ORF r 
such as double stranded DNA, single-stranded DNA 
5 complementary to -the message, single-stranded RNA 

complementary to the message, or chimeric DNA/RNA molecules 
so hybridizable. The nucleic acid can alternatively or 
additionally include either nonnative nucleotides, 
alternative internucleotide linkages, or both, so long as 

10 complementary binding can be obtained. For example, probes 
can include phosphorothioates, methylphosphonates, 
morpholino analogs, and peptide nucleic acids (PNA) , as are 
described, for example, in U.S. Patent Nos. 5,142,047; 
5,235,033; 5,166,315; 5,217,866; 5,184,444; 5,861,250. 

15 Usefully, however, such probes are provided in a 

form and quantity suitable for amplification, where the 
amplified product is thereafter to be used in the 
hybridization reactions that probe gene expression. 
Typically, such probes are provided in a form and quantity 

20 suitable for amplification by PCR or by other well known 
amplification technique. One such technique additional to 
PCR is rolling circle amplification, as is described, inter 
alia, in U.S. Patent Nos. 5,854,033 and 5,714,320 and 
international patent publications WO 97/19193 and 

25 WO 00/15779. As is well understood, where the probes are 
to be provided in a form suitable for amplification, the 
range of nucleic acid analogues and/or internucleotide 
linkages will be constrained by the requirements and nature 
of the amplification enzyme. 

30 Where the probe is to be provided in form 

suitable for amplification, the quantity need not be 
sufficient for direct hybridization for gene expression 
analysis, and need be sufficient only to function as an 
amplification template, typically at least about 1, 10 or 

35 100 pg or more. 
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Each discrete amplifiable probe can also be 
packaged with amplification primers, either in a single 
composition that comprises probe template and primers, or 
in a kit that comprises such primers separately packaged 
5 therefrom. As earlier mentioned, the ORF-specific 

5' primers used for genomic amplification can have a first 
common sequence added thereto, and the ORF-specific 3 1 
primers used for genomic amplification can have a second, 
different, common sequence added thereto, thus permitting, 

10 in this embodiment, the use of a single set of 5' and 3' 
primers to amplify any one of the probes. The probe 
composition and/or kit can also include buffers, enzyme, 
etc., required to effect amplification.. 

As mentioned earlier, when intended for use on a 

15 genome-derived single exon microarray of the present 
invention, the genome-derived single exon probes of the 
present invention will typically average at least about 
100, 200, 300, 400 or 500 bp in length, including (and 
typically, but not necessarily centered about) the ORF. 

20 Furthermore, when intended for use on a genome-derived 
single exon microarray of the present invention, the 
genome-derived single exon probes of the present invention 
will typically not contain a detectable label. 

When intended for use in solution phase 

25 hybridization, however — that is, for use in a 

hybridization reaction in which the probe is not first 
bound to a support substrate (although the target may 
indeed be so bound) - length constraints that are imposed 
in microarray-based hybridization approaches will be 

30 relaxed, and such probes will typically be labeled. 

In such case, the only functional constraint that 
dictates the minimum size of such probe is that each such 
probe must be capable of specifically identifying in a 
hybridization reaction the exon from which it is drawn. In 

35 theory, a probe of as little as 17 nucleotides is capable 
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of uniquely identifying its cognate sequence in the human 
genome. For hybridization to expressed message - a subset 
of target sequence that is much reduced in complexity as 
compared to genomic sequence — even fewer nucleotides- are 
5 required for specificity. 

Therefore, the probes of the present invention 
can include as few as 20, 25 or 50 bp or ORF, or more. In 
particular embodiments, the ORF sequences are given in SEQ 
ID NOS. 13,115 - 26,012, respectively, for probe SEQ ID 

10 NOS. 1 - 13,114. The minimum amount of ORF required to be 
included in the probe of the present invention in order to 
provide specific, signal in either solution phase or 
microarray-based hybridizations can readily be determined 
for each of ORF SEQ ID NOS. 13,115 - 26,012 individually 

15 by routine experimentation using standard high stringency 
conditions . 

Such high stringency conditions are described, 
inter alia, in Ausubel et al. and Maniatis et al. For 
microarray-based hybridization, standard high stringency 

20 conditions can usefully be 50% formamide, 5X SSC, 0.2 pg/ul 
poly(dA), 0.2 pg/ul human c 0 tl DNA, and 0.5 % SDS, in a 
humid oven at 42°C overnight, followed by successive washes 
of the microarray in IX SSC, 0.2% SDS at 55°C for 5 
minutes, and then 0.1X SSC, 0.2% SDS, at 55°C for 20 

25 minutes. For solution phase hybridization, standard high 

stringency conditions can usefully be aqueous hybridization 
at 65°C in 6X SSC. Lower stringency conditions, suitable 
for cross-hybridization to mRNA encoding structurally- and 
functionally-related proteins, can usefully be the same as 

30 the high stringency conditions but with reduction in 
temperature for hybridization and washing to room 
temperature (approximately 25°C) . 

When intended for use in solution phase 
hybridization, the maximum size of the single exon probes 

35 of the present invention is dictated by the proximity of 
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other expressed exons in genomic DNA: although each single 
exon probe can include intergenic and/or intronic material 
contiguous to the ORF in the human genome, each probe of 
the present invention will include portions of only one 
5 expressed exon. 

Thus, each single exon probe will include no more' 
than about 25 kb of contiguous genomic sequence, more 
typically no more than about 20 kb of contiguous genomic 
sequence, more usually no more than about 15 kb, even more 

10 usually no more than about 10 kb. Usually, probes that are 
maximally about 5 kb will be used, more typically no more 
than about 3 kb. 

It will be appreciated that the Sequence Listing 
appended hereto presents, by convention, only that strand 

15 of the probe and ORF sequence that can be directly 

translated reading from 5' to 3' end. As would be well 
understood by one of skill in the art, single stranded 
probes must be complementary in sequence to the ORF as 
present in an mRNA; it is well within the skill in the art 

20 to determine such complementary sequence. It will further 
be understood that double stranded probes can be used in 
both solution-phase hybridization and microarray-based 
hybridization if suitably denatured. 

Thus, it is. an aspect of the present invention to 

25 provide single-stranded nucleic acid probes that have 

sequence complementary to those described herein above and 
below, and double-stranded probes one strand of which has 
sequence complementary to the probes described herein. 

The probes can, but need not, contain intergenic 

30 and/or intronic material that flanks the ORF, on one or 

both sides, in the same linear relationship to the ORF that 
the intergenic and/or intronic material bears to the ORF in 
genomic DNA. The probes do not, however, contain nucleic 
acid derived from more than one expressed ORF. 

35 And when intended for use in solution 
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hybridization, the probes of the present invention can 
usefully have detectable labels. Nucleic acid labels are 
well known in the art, and include, inter alia, radioactive 
labels, such as 3 H, 32 P, 33 P, 35 S, 125 I, 131 I; fluorescent 
5 labels, such as Cy3, Cy5, Cy5.5, Cy7, SYBR® 

Green and other labels described in Haugland, 
Handbook of Fluorescent Probes and Research Chemicals, 7th 
ed., Molecular Probes Inc., Eugene, OR (2000), or 
fluorescence resonance energy transfer tandem conjugates 
10 thereof; labels suitable for chemiluminescent and/or 

enhanced chemiluminescent detection; labels suitable for 
ESR and NMR detection; and labels that include one member 
of a specific binding pair, such as biotin, digoxigenin, or 
the like. 

15 The probes, either in quantity sufficient for 

hybridization or sufficient for amplification, can be 
provided in individual vials or containers. 

Alternatively, such probes can usefully be 
packaged as a plurality of such individual genome-derived 
20 single exon probes. 

When provided as a collection of plural 
individual probes, the probes are typically made available 
in amplifiable form in a spatially-addressable ordered set, 
typically one per well of a microtiter dish. Although a 96 
25 well microtiter plate can be used, greater efficiency is 
obtained using higher density arrays. 

If, as earlier mentioned, the ORF-specific 
5' primers used for genomic amplification had a first 
common sequence added thereto, and the ORF-specific 3' 
30 primers used for genomic amplification had a second, 

different, common sequence added thereto, a single set of 
5» and 3 1 primers can be used to amplify all of the probes 
from the amplifiable ordered set. 

Such collections of genome-derived single exon 
35 probes can usefully include a plurality of probes chosen 
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for the common attribute of expression in the human bone 
marrow. 

In such defined subsets, typically at least 50, 
60, 75, 80, 85, 90 or 95% or more of the probes will be 
5 chosen by their expression in the defined tissue or cell 
type. 

The single exon probes of the present invention, 
as well as fragments of the single exon probes comprising 
selectively hybridizable portions of the probe ORF, can be 

10 used to obtain the full length cDNA that includes the ORF 
by (i) screening of cDNA libraries; (ii) rapid 
amplification of cDNA ends ("RACE"); or (iii) other 
conventional means, as are described, inter alia, in 
Ausubel et al. and Maniatis et al. 

15 It is another aspect of the present invention to 

provide genome-derived single exon nucleic acid microarrays 
useful for gene expression analysis, where the term 
"microarray" has the meaning given in the definitional 
section of this description, supra. 

20 The invention particularly provides genome- 

derived single-exon nucleic acid microarrays comprising a 
plurality of probes known to be expressed in human bone 
marrow. In preferred embodiments, the present invention 
provides human genome-derived single exon microarrays 

25 comprising a plurality of probes drawn from the group 
consisting of SEQ ID NOS.: 1 ~ 13,114. 

When used for gene expression analysis, the 
genome-derived single exon microarrays provide greater 
physical informational density than do the genome-derived 

30 single exon microarrays that have lower percentages of 

probes known to be expressed commonly in the tested tissue. 
At a fixed probe density, for example , a given microarray 
surface area of the defined subset genome-derived single 
exon microarray can yield a greater number of expression 

35 measurements. Alternatively, at a given probe density, the 
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same number of expression measurements can be obtained from 
a smaller substrate surface area. Alternatively, at a- . •. " 
fixed probe density and fixed surface area, probes can be 
provided redundantly, providing greater reliability in 
5 signal measurement for any given probe. Furthermore, with 
a higher percentage of probes known to be expressed in the 
assayed tissue, the dynamic range of the detection means 
can be adjusted to reveal finer levels discrimination among 
the levels of expression. 

10 Although particularly described with respect to 

their utility as probes of gene expression, particularly as 
probes to be included on a genome-derived single exon 
microarray, each of the nucleic acids having SEQ ID NOS.: 1 
- 13,114 contains an open-reading frame, set forth 

15 respectively in SEQ ID NOS. : 13,115 - 26,012, that encodes 
a protein domain. Thus, each of SEQ ID NOS. 1 - 13,114 can 
be used, or that portion thereof in SEQ ID NOS. 13,115 - 
26,012 used, to express a protein domain, by standard in 
vitro recombinant techniques. See Ausubel et al. and 

20 Maniatis et al. 

Additionally, kits are available commercially 
that readily permit such nucleic acids to be expressed as 
protein in bacterial cells, insect cells, or mammalian 
cells, as desired (e.g., HAT™ Protein Expression & 

25 Purification System, ClonTech Laboratories, Palo Alto, CA; 
Adeno-X™ Expression System, ClonTech Laboratories, Palo 
Alto, CA; Protein Fusion & Purification (pMAL™) System, New 
England Biolabs, Beverley, MA) 

Furthermore, shorter peptides can be chemically 

30 synthesized using commercial peptide synthesizing equipment 
and well known techniques. Procedures are described, inter 
alia, in Chan et al. (eds.), Fmoc Solid Phase Peptide 
Synthesis; A Practical Approach (Practical Approach Series, 
(Paper)), Oxford Univ. Press (March 2000) (ISBN: 

35 0199637245); Jones, Amino Acid and Peptide Synthesis 
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{Oxford Chemistry Primers, No 7) , Oxford Univ. Press 
(August 1992) (ISBN: 0198556683); and Bodanszky, Principles 
of Peptide Synthesis (Springer Laboratory) , Springer Verlag 
(December 1993) (ISBN: 0387564314). 
5 It is, therefore, another aspect of the invention 

to provide peptides comprising an amino acid sequence 
translated from SEQ ID NOS. : 13,115 - 26,012. Such amino 
acid sequences are set out in SEQ ID NOS: 26,013 - 38,628. 
Any such recombinantly-expressed or synthesized peptide of 

10 at least 8, and preferably at least about 15, amino acids, 
can be conjugated to a carrier protein and used to generate 
antibody that recognizes the peptide. Thus, it is a 
further aspect of the invention to provide peptides that 
have at least 8, preferably at least 15, consecutive amino 

15 acids. 

The following examples are offered by way of 
illustration and not by way of limitation. 

20 EXAMPLE 1 

Preparation of Single Exon Microarrays from ORFs Predicted 
in Human Genomic Sequence 

Bioinf ormatics Results 

25 All human BAC sequences in fewer than 10 pieces 

that had been accessioned in a five month period 
immediately preceding this study were downloaded from 
GenBank. This corresponds to -2200 clones, totaling -350 
MB of sequence, or approximately 10% of the human genome. 

30 After masking repetitive elements using the 

program CROSSMATCH, the sequence was analyzed for open 
reading frames using three separate gene finding programs. 
The three programs predict genes using independent 
algorithmic methods developed on independent training sets: 

35 GRAIL uses a neural network, GENEFINDER uses a hidden 
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Markoff model, and DICTION, a program proprietary to 
Genetics Institute, operates according to a different 
heuristic. The results of all three programs were used to 
create a prediction matrix across the segment of genomic 
5 DNA. 

The three gene finding programs yielded a range 
of results. GRAIL identified the greatest percentage of 
genomic sequence as putative coding region, 2% of the data 
analyzed. GENEFINDER was second, calling 1%, and DICTION 

10 yielded the least putative coding region, with 0.8% of 
genomic sequence called as coding region. 

The consensus data were as follows. GRAIL and 
GENEFINDER agreed on 0.7% of genomic sequence, GRAIL and 
DICTION agreed on 0.5% of genomic sequence, and the three 

15 programs together agreed on 0.25% of the data analyzed. 
That is, 0.25% of the genomic sequence was identified by 
all three of the programs as containing putative coding 
region . 

ORFs predicted by any two of the three programs 
20 {"consensus ORFs") were assorted into "gene bins" using two 
criteria: (1) any 7 consecutive exons within a 25 kb window 
were placed together in a bin as likely contributing to a 
single gene, and (2) all ORFs within a 25 kb window were 
placed together in a bin as likely contributing to a single 
25 gene if fewer than 7 exons were found within the 25 kb 
window. 

PCR 

The largest ORF from each gene bin that did not 
30 span repetitive sequence was then chosen for amplification, 
as -were all consensus ORFs longer than 500 bp. This method 
approximated one exon per gene; however, a number of genes 
were found to be represented by multiple elements. 

Previously, we had determined that DNA fragments 
35 fewer than 250 bp in length do not bind well to the amino- 
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modified glass surface of the slides used as support 
substrate for construction of microarrays; therefore, 
amplicons were designed in the present experiments to 
approximate 500 bp in length. 
5 Accordingly, after selecting the largest ORF per 

gene bin, a 500 bp fragment of sequence centered on the ORF 
was passed to the primer picking software, PRIMER3 
(available online for use at 

http://www-genome.wi.mit.edu/cgi-bin/primer/ }. A first 

10 ' additional sequence was commonly added to each ORF-unique 
5' primer, and a second, different, additional sequence was 
commonly added to each ORF-unique 3' primer, to permit 
subsequent reamplif ication of the amplicon using a single 
set of "universal" 5' and 3' primers, thus immortalizing 

15 the amplicon. The addition of universal priming sequences 
also facilitates sequence verification, and can be used to 
add a cloning site should some ORFs be found to warrant 
further study. 

The ORFs were then PCR amplified from genomic 

20 DNA, verified on agarose gels, and sequenced using the 

universal primers to validate the identity of the amplicon 
to be spotted in the microarray. 

Primers were supplied by Operon Technologies 
(Alameda, CA) . PCR amplification was performed by standard 

25 techniques using human genomic DNA (Clontech, Palo Alto, 
CA) as template. Each PCR product was verified by SYBR® 
green (Molecular Probes, Inc., Eugene, OR) staining of 
agarose gels, with subsequent imaging by Fluorimager 
(Molecular Dynamics, Inc., Sunnyvale, CA) . PCR 

30 amplification was classified as successful if a single band 
appeared. 

The success rate for amplifying ORFs of interest 
directly from genomic DNA using PCR was approximately 75%. 
FIG. 5 graphs the distribution of predicted ORF (exon) 
35 length and distribution of amplified PCR products, with ORF 
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length shown in red and PCR product length shown in blue 
(which may appear black in the figure) . Although the range 
of ORF sizes is readily seen to extend to beyond 900 bp, 
the mean predicted exon size was only 229 bp, with a median 
5 size of 150 bp (n=9498) . With an average amplicon size of 
475 ± 25 bp, approximately 50% of the average PCR 
amplification product contained predicted coding region, 
with the remaining 50% of the amplicon containing either 
intron, intergenic sequence, or both. 

10 Using a strategy predicated on amplifying about 

500 bp, it was found that long exons had a higher PCR 
failure rate. To address this, the bioinf ormatics process 
was adjusted to amplify 1000, 1500 or 2000 bp fragments 
from exons larger than 500 bp. This improved the rate of 

15 successful amplification of exons exceeding 500 bp, 

constituting about 9.2% of the exons predicted by the gene 
finding algorithms. 

Approximately 75% of the probes disposed on the 
array (90% of those that successfully PCR amplified) were 

20 sequence-verified by sequencing in both the forward and 
reverse direction using MegaBACE sequencer (Molecular 
Dynamics, Inc., Sunnyvale, CA) , universal primers, and 
standard protocols. 

Some genomic clones (BACs) yielded very poor PCR 

25 and sequencing results. The reasons for this are unclear, 
but may be related to the quality of early draft sequence 
or the inclusion of vector and host contamination in some 
submitted sequence data. 

Although the intronic and intergenic material 

30 flanking coding regions could theoretically interfere with 
hybridization during microarray experiments, subsequent 
empirical results demonstrated that differential expression 
ratios were not significantly affected by the presence of 
noncoding sequence. The variation in exon size was 

35 similarly found not to affect differential expression 
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ratios significantly; however, variation in exon size was 
observed to affect the absolute signal intensity (data not 
shown) . 

The 350 MB of genomic DNA was, by the above- 
5 described process, reduced to 9750 discrete probes, which 
were spotted in duplicate onto glass slides. using 
commercially available instrumentation (MicroArray Genii 
Spotter and/or MicroArray Genlll Spotter, Molecular 
Dynamics, Inc., Sunnyvale, CA) . Each slide additionally 

10 included either 16 or 32 E. coli genes, the average 

hybridization signal of which was- used as a measure of 
background biological noise. 

Each of the probe sequences was BLASTed against 
the human EST data set, the NR data set, and SwissProt 

15 GenBank (May 7, 1999 release 2.0.9). 

One third of the probe sequences (as amplified) 
produced an exact match (BLAST Expect { "E" ) values less 
than 1 e" 100 ) to either an EST (20% of sequences) or a known 
mRNA (13% of sequences) . A further 22% of the probe 

20 sequences showed some homology to a known EST or mRNA 

(BLAST E values from 1 e" 5 to 1 e~") . The remaining 45% of 
the probe sequences showed no significant sequence homology 
to any expressed, or potentially expressed, sequences 
present in public databases. 

25 All of the probe sequences (as amplified) were 

then analyzed for protein similarities with the SwissProt 
database using BLASTX, Gish et al. r Nature Genet. 3:266 
(1993) . The predicted functional breakdowns of the 2/3 of 
probes identical or homologous to known sequences are 

30 presented in Table 1. 

Table 1 

Function of Predicted ORFs As Deduced From Comparative 
Sequence Analysis 
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Total V6 chip V7 chip Function Predicted from 

Comparative Sequence 
Analysis 



211 


96 


115 


Receptor 


120 


43 


77 


Zinc Finger 


30 


11 


19 


Homeobox 


25 


9 


16 


Transcription Factor 


17 


11 


7 


Transcription 


118 


57 


61 


Structural 


95 


39 


56 


Kinase 


36 


18 


18 


Phosphatase 


83 


31 


52 


Ribosomal 


45 


19 


26 


Transport 


21 


17 


14 


Growth Factor 


17 


12 


5 


Cytochrome 


50 


33 


17 


Channel 



As can be seen, the two most common types of 
genes were transcription factors and receptors, making up 
2.2% and 1.8% of the arrayed elements, respectively. 

5 

EXAMPLE 2 

Gene Expression Measurements From Genome-Derived Single 
Exon Microarrays 

10 

The two genome-derived single exon microarrays 
prepared according to Example 1 were hybridized in a series 
of simultaneous two-color fluorescence experiments to (1) 
15 Cy3-labeled cDNA synthesized from message drawn 

individually from each of brain, heart, liver, fetal liver, 
placenta, lung, bone marrow, HeLa, BT 474, or HBL 100 
cells, and (2) Cy5-labeled cDNA prepared from message 
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pooled from all ten tissues and cell types, as a control in 
each of the measurements. Hybridization and scanning were 
carried out using standard protocols and Molecular Dynamics 
• equipment . 

5 Briefly, mRNA samples were bought from commercial 

sources (Clontech, Palo Alto, CA and Amersham Pharmacia 
Biotech (APB) ) . Cy3-dCTP and Cy5-dCTP (both from APB) were 
incorporated during separate reverse transcriptions of 1 pg 
of polyA + mRNA performed using 1 pg oligo (dT) 12-18 primer 

10 and 2 ug random 9mer primers as follows. After heating to 
70°C, the RNA: primer mixture was snap cooled on ice. After 
snap cooling on ice, added to the RNA to the stated final 
concentration was: IX Superscript II buffer, 0.01 M DTT, 
lOOpM dATP, 100 pM dGTP, 100 pM dTTP, 50 pM dCTP, 50 pM 

15 Cy3-dCTP or Cy5-dCTP 50 pM, and 200 U Superscript II 

enzyme. The reaction was incubated for 2 hours at 42°C. 
After 2 hours, the first strand cDNA was isolated by adding 
1 U Ribonuclease H, and incubating for 30 minutes at 37°C. 
The reaction was then purified using a Qiagen PCR cleanup 

20 column, increasing the number of ethanol washes to 5. 
Probe was eluted using 10 mM Tris pH 8.5. 

Using a spectrophotometer, probes were measured 
for dye incorporation. Volumes of both Cy3 and Cy5 cDNA 
corresponding to 50 pmoles of each dye were then dried in a 

25 Speedvac, resuspended in 30 pi hybridization solution 

containing 50% formamide, 5X SSC, 0.2 ug/ul poly(dA), 0.2 
pg/pl human c Q tl DNA, and 0.5 % SDS. 

Hybridizations were carried out under a 
coverslip, with the array placed in a humid oven at 42°C 

30 overnight. Before scanning, slides were washed in IX SSC, 
0.2% SDS at 55°C for 5 minutes, followed by 0.1X SSC, 0.2% 
SDS, at 55°C for 20 minutes. Slides were briefly dipped in 
water and dried thoroughly under a gentle stream of 
nitrogen . 

35 Slides were scanned using a Molecular Dynamics 
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Gen3 scanner, as described. Schena (ed.), Microarray 
Biochip: Tools and Technology , Eaton Publishing 
Company/BioTechniques Books Division (2000) {ISBN: 
1881299376). 

5 Although the use of pooled cDNA as a reference 

permitted the survey of a large number of tissues, it 
attenuates the measurement of relative gene expression, 
since every highly expressed gene in the tissue/cell type- 
specific fluorescence channel will be present to a level of 

10 at least 10% in the control channel. Because of this fact, 
both signal and expression ratios (the latter hereinafter, 
"expression" or "relative expression") for each probe were 
normalized using the average ratio or average signal, 
respectively, as measured across the whole slide. 

15 Data were accepted for further analysis only when 

signal was at least three times greater than biological 
noise, the latter defined by the average signal produced by 
the E. coli control genes. 

The relative expression signal for these probes 

20 was then plotted as function of tissue or cell type, and is 
presented in FIG. 6. 

FIG. 6 shows the distribution of expression 
across a panel of ten tissues. The graph shows the number 
of sequence-verified products that were either not 

25 expressed ("0"), expressed in one or more but not all 

tested tissues ("1" - "9"), and expressed in all tissues 
tested ("10") . 

Of 9999 arrayed elements on the two microarrays 
(including positive and negative controls and "failed" 

30 products), 2353 (51%) were expressed in at least one tissue 
or cell type. Of the gene elements showing significant 
signal — where expression was scored as "significant" if 
the normalized Cy3 signal was greater than 1, representing 
signal 5-fold over biological noise (0.2) - 39% (991) were 

35 expressed in all 10 tissues. The next most common class 
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(15%) consisted of gene elements expressed in only a single 
tissue. 

. * The genes expressed in a single tissue were 
further analyzed, and the results of the analyses are 
5 compiled in FIG . 7. 

FIG. 7A is a matrix presenting the expression of 
all verified sequences that showed expression greater than 
3 in at least one tissue'- Each clone is represented by a 
column in the matrix. Each of the 10 tissues assayed is 

10 represented by a separate row in the matrix, and relative 
expression of a clone in that tissue is indicated at the 
respective node by intensity of green shading, with the 
intensity legend shown in panel B. The top row of the 
matrix ("EST Hit") contains "bioinf ormatic" rather than 

15 "physical" expression data - that is, presents the results 
returned by query of EST, NR and SwissProt databases using 
the probe sequence. The legend for "bioinf ormatic 
expression" (i.e., degree of homology returned) is 
presented in panel C. Briefly, white is known, black is 

20 novel, with gray depicting nonidentical with significant 

homology (white: E values < le-100; gray: E values from le- 
05 to le-99; black: E values > le-05) . 

As FIG . 7 readily shows, heart and brain were 
demonstrated to have the greatest numbers of genes that 

25 were shown to be uniquely expressed in the respective 
tissue- In brain, 200 uniquely expressed genes were 
identified; in heart, 150. The remaining tissues gave the 
following figures for uniquely expressed genes: liver, 100; 
lung, 70; fetal liver, 150; bone marrow, 75; placenta, 100; 

30 HeLa, 50; HBL, 100; and BT474, 50. 

It was further observed that there were many more 
"novel" genes among those that were up-regulated in only- 
one tissue, as compared with those that were down-regulated 
in only one tissue. In fact, it was found that ORFs whose 

35 expression was measurable in only a single of the tested 
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tissues were represented in sequencing databases at a rate 
of only 11%, whereas 36% of the ORFs whose expression was 
measurable in 9 of the tissues were present in public 
databases. As for those ORFs expressed in all ten tissues, 
5 fully 45% were present in existing expressed sequence 

databases. These results are not unexpected, since genes 
expressed in a greater number of tissues have a higher 
likelihood of being, and thus of having been, discovered by 
EST approaches. 

10 

Comparison of Signal from Known and Unknown Genes 

The normalized signal of the genes found to have 
high homology to genes present in the GenBank human EST 
database were compared to the normalized signal of those 

15 genes not found in the GenBank human EST database. The 
data are shown in FIG. 8. 

FIG. 8 shows the normalized Cy3 signal intensity 
for all sequence-verified products with a BLAST Expect 
("E") value of greater than le-30 (designated "unknown") 

20 upon query of existing EST, NR and SwissProt databases, and 
shows in blue the normalized Cy3 signal intensity for all 
sequence-verified products with a BLAST Expect value of 
less than le-30 ("known") . Note that biological background 
noise has an averaged normalized Cy3 signal intensity of 

25 0.2. 

As expected, the most highly expressed of the 
ORFs were "known" genes. This is not surprising, since 
very high signal intensity correlates with very commonly- 
expressed genes, which have a higher likelihood of being 

30 found by EST sequence. 

However, a significant point is that a large 
number of even the high expressers were "unknown". Since 
the genomic approach used to identify genes and to confirm 
their expression does not bias exons toward either the 3' 

35 or 5' end of a gene, many of these high expression genes 
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will not have been detected in an end-sequenced cDNA 
library. 

- ... The significant point is that presence of the 

gene in an EST database is not a prerequisite for 
.5 incorporation into a genome-derived microarray, and 

further, that arraying such "unknown" exons can help to 
assign function to as-yet undiscovered genes. 

Verification of Gene Expression 

10 To ascertain the validity of the approach 

described above to identify genes from raw genomic 
sequence, expression of two of the probes was assayed using 
reverse transcriptase polymerase chain reaction {RT PCR) 
and northern blot analysis. 

15 Two microarray probes were selected on the basis 

of exon size, prior sequencing success, and tissue-specific 
gene expression patterns as measured by the microarray 
experiments. The primers originally used to amplify the 
two respective ORFs from genomic DNA were used in RT PCR 

20 against a panel of tissue-specific cDNAs (Rapid-Scan gene 
expression panel 24 human cDNAs) {OriGene Technologies, 
Inc., Rockville, MD) . 

Sequence AL079300_1 was shown by microarray 
hybridization to be present in cardiac tissue, and sequence 

25 AL031734_1 was shown by microarray experiment to be present 
in placental tissue (data not shown) . RT-PCR on these two 
sequences confirmed the tissue-specific gene expression as 
measured by microarrays, as ascertained by the presence of 
a correctly sized PCR product from the respective tissue 

30 type cDNAs. 

Clearly, all microarray results cannot, and 
indeed should not, be confirmed by independent assay 
methods, or the high throughput, highly parallel advantages 
of microarray hybridization assays will be lost. However, 

35 in addition to the two RT-PCR results presented above, the 
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observation that 1/3 of the arrayed genes exist in 
expression databases provides powerful confirmation of the 
power of our methodology — which combines bioinf ormatic 
prediction with expression confirmation using genome- 
5 derived single exon microarrays — to identify novel genes 
from raw genomic data . 

To verify that the approach further provides 
correct characterization of the expression patterns of the 
identified genes, a detailed analysis was performed of the 

10 microarrayed sequences that showed high signal in brain. 

For this latter analysis, sequences that showed 
high (normalized) signal in brain, but which showed very 
low (normalized) signal (less than 0.5, determined to be 
biological noise) in all other tissues, were further 

15 studied. There were 82 sequences that fit these criteria, 
approximately 2% of the arrayed elements. The 10 sequences 
showing the highest signal in brain in microarray 
hybridizations are detailed in Table 2, along with assigned 
function, if known or reasonably predicted. 

20 

Table 2 



Function 


of the Most Highly 




Expressed Genes Expressed Only in Brain 


Microarray 


Normal 


Expressi 


Homology 


Gene Function 


Sequence 


ized 


on Ratio 


to EST 


as described by 


Name 


Signal 




present 


GenBank 








in 










GenBank 




AP000217-1 


5.2 


+7.7 


High 


S-100 protein, 










b-chain, Ca 2+ 










binding protein 










expressed in 










central nervous 
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system 


AP000047-1 


2.3 




High 


Unknown 
Function 


AC006548-9 


1.7 




High 


Similar to 
mouse membrane 
glyco— protein 
M6, expressed 
in central 
nervous system 


AC007245-5 


1.5 




High 


Similar to 
amphiphysin, a 
synapt ic 
vesicle- 
associated 
protein. Ref 21 


L44140-4 


1.2 


+ 2.0 


High 


Endothelial 
act in— binding 
protein found j 
in nonmuscle 
f ilamin 


AC004689-9 


1.2 


+3.5 


High 


Protein 
Phosphatase 
PP2A, neuronal/ 
downregulates 
activated 
protein kinases 


AL031657-1 


1.2 


+3.0 


High 


Unknown 
function/ 
Contains the 

^nbi\7T~"i n mf~>t*i i~ 

a common 
protein 
sequence motif 


AC009266-2 


1.1 


+3.7 


Low 


Low homology to 
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the 

Synaptotagmin I 
protein in 

low levels 
throughout rat 
brain 


AP000086-1 


1.0 


+2.7 


Low 


Unknown, very 
poor homology 
to collagen 


AC004689-3 


1.0 




High 


Protein 
Phosphatase 
PP2A, neuronal/ 
downregulates 
activated 
protein kinases 



Of the ten sequences studied by these latter 
confirmatory approaches , eight were previously known. Of 
these eight, six had previously been reported to be 

5 important in the central nervous system or brain. The exon 
giving the highest signal (AP00217-1) was found to be the 
gene encoding an S100B Ca 2+ binding protein, reported in 
the literature to be highly and uniquely expressed in the 
central nervous system. Heizmann, Neurochem. Res. 9:1097 

10 (1997). 

A number of the brain-specific probe sequences 
(including AC006548-9, AC009266-2) did not have homology to 
any known human cDNAs in GenBank but did show homology to 
rat and mouse cDNAs . Sequences AC004689-9 and AC004689-3 
15 were both found to be phosphatases present in neurons 
(Millward et al., Trends Biochem. Sci. 24 (5 ): 186-191 
(1999)). Two microarray sequences, AP000047-1 and 
AP000086-1 have unknown function, with AP000086-1 being 
absent from GenBank. Functionality can now be narrowed 
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down to a role in the central nervous system for both of 
these genes, showing the power of designing microarrays in 
this fashion. 

Next, the function of the chip sequences with the 
5 highest (normalized) signal intensity in brain, regardless 
of expression in other tissues, was assessed. In this 
latter analysis, we found expression of many more common 
genes, since the sequences were not limited to those 
expressed only in brain. For example, looking at the 20 

10 highest signal intensity spots in brain, 4 were similar to 
tubulin (AC00807905; AF146191-2; AC007664-4; AF14191-2), 2 
were similar to actin (AL035701-2; AL034402-1) , and 6 were 
found to be homologous to glyceral.dehyde-3-phosphate 
dehydrogenase (GAPDH) (AL035604-1; Z86090-1; AC006064-L, 

15 AC006064-K; AC035604-3; AC006064-L) . These genes are often 
used as controls or housekeeping genes in microarray 
experiments of all types. 

Other interesting genes highly expressed in brain 
were a ferritin heavy chain protein, which is reported in 

20 the literature to be found in brain and liver (Joshi et 
al. t J. Neurol, Sc±. 134 (Suppl ): 52-56 (1995)), a result 
duplicated with the array. Other highly expressed chip 
sequences included a translation elongation factor 10 
(AC007564-4) , a DEAD-box homolog (AL023804-4 ) , and a Y- 

25 chromosome RNA-binding motif (Chai et al. r Genomics 

49(2):283-89 (1998) ) (AC007320-3) . A low homology analog 
(AP00123-1/2) to a gene, DSCR1, thought to be involved in 
trisomy 21 (Down's syndrome), showed high expression in 
both brain and heart, in agreement with the literature 

30 (Fuentes et al., Mol. Genet. 4 (10) : 1935-44 (1995)). 

As a further validation of the approach, we 
selected the BAC AC006064 to be included on the array. 
This BAC was known to contain the GAPDH gene, and thus 
could be used as a control for the ORF selection process. 

35 The gene finding and exon selection algorithms resulted in 
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choosing 25 exons from BAC AC006064 for spotting onto the 
array, of which four were drawn from the GAPDH gene. Table 
3 shows the comparison of the average expression ratio for 
the 4 exons from BAC00 6064 compared with the average 
5 expression ratio for 5 different dilutions of a 
commercially available GAPDH cDNA (Clontech) . 



Table 3 



Comparison of Expression Ratio, for each 
tissue, of GAPDH 




AC006064 (n = 4) 


Control ( n = 5) 


Bone Marrow 


-1.81 ± 0.11 


-1.85 ± 0.08 


Brain 


-1.41 ± 0.11 


-1.17 ± 0.05 


BT474 


1.85 ± 0.09 


1.66 ± 0.12 


Fetal Liver 


-1.62 ± 0.07 


-1.41 ± 0.05 


HBL100 


1.32 ± 0.05 


2.64 ± 0.12 


Heart 


1.16 ± 0.09 


1.56 ± 0.10 


HeLa 


1.11 ±0.06 


1.30 ± 0.15 


Liver 


-1.62 ± 0.22 


-2.07 + 


Lung 


-4.95 ± 0.93 


-3.75 ± 0.21 


Placenta 


-3.56 ± 0.25 


-3.52 ± 0.43 



10 

Each tissue shows excellent agreement between the 
experimentally chosen exons and the control, again 
demonstrating the validity of the present exon mining 
approach. In addition, the data also show the variability 
15 of expression of GAPDH within tissues, calling into 
question its classification as a housekeeping gene and 
utility as a housekeeping control in microarray . 
experiments . 

20 EXAMPLE 3 
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Representation of Sequence and Expression Data as a 
"Mondrian" 



For each genomic clone processed for microarray 
5 as above-described, a plethora of information was 

accumulated, including full clone sequence, probe sequence 
within the clone, results of each of the three gene finding 
programs, EST information associated with the probe 
sequences, and microarray signal and expression for 
10 multiple tissues, challenging our ability to display the 
information . 

Accordingly, we devised a new tool for visual 
display of the sequence with its attendant annotation 
which, in deference to its visual similarity to the 
15 paintings of Piet Mondrian, is hereinafter termed a 
"Mondrian". FIGS. 3 and 4 present the key to the 
information presented on a Mondrian. 

FIG. 9 presents a Mondrian of BAC AC008172 (bases 
25,000 to 130,000 shown), containing the carbamyl phosphate 
20 synthetase gene (AF154830 . 1) . Purple background within the 
region shown as field 81 in FIG. 3 indicates all 37 known 
exons for this gene. 

As can be seen, GRAIL II successfully identified 
27 of the known exons (73%), GENEFINDER successfully 
25 identified 37 of the known exons (100%), while DICTION 
identified 7 of the known exons (19%) . 

Seven of the predicted exons were selected for 
physical assay, of which 5 successfully amplified by PCR 
and were sequenced. These five exons were all found to be 
30 from the same gene, the carbamyl phosphate synthetase gene 
(AF154830.1) . 

The five exons were arrayed, and gene expression, 
measured across 10 tissues. As is readily seen in the 
Mondrian, the five chip sequences on the array show 
35 identical expression patterns, elegantly demonstrating the 
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reproducibility of the system.. 

FIG . 10 is a Mondrian of BAC AL049839. We 
selected 12 exons from this BAC, of which 10 successfully 
sequenced, which were found to form between 5 and 6 genes. 
5 Interestingly, 4 of the genes on this BAC are protease 
inhibitors. Again, these data elegantly show that exons 
selected from the same gene show the same expression 
patterns, depicted below the red line. From this figure, 
it is clear that our ability to find known genes is very 

10 good. A novel gene is also found from 86.6 kb to 88.6 kb, 
upon which all the exon finding programs agree. We are 
confident we have two exons from a single gene since they 
show the same expression patterns and the exons are 
proximal to each other. Backgrounds in the following 

15 colors indicate a known gene (top to bottom) : 
red = kallistatin protease inhibitor (P29622) ; 
purple = plasma serine protease inhibitor (P05154); 
turquoise = al anti-chymotrypsin (P01011); mauve = 40S 
ribosomal protein (P08865) . Note that chip sequence 8 and 

20 12 did not sequence verify. 

EXAMPLE 4 

Genome-Derived Single Exon Probes Useful For Measuring 
25 Human Gene Expression 

The protocols set forth in Examples 1 and 2, 
supra, were applied to additional human genomic sequence as 
it became newly available in GenBank to identify unique 
30 exons in the human genome that could be shown to be 
expressed at significant levels in bone marrow tissue. 

These unique exons are within longer probe 
sequences. Each probe was completely sequenced on both 
strands prior to its use on a genome-derived single exon 
35 microarray; sequencing confirms the exact chemical 
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structure of each probe. An added benefit of sequencing is 
that it placed us in possession of a set of single base- 
incremented fragments of the sequenced nucleic acid, 
starting from the sequencing primer 3' OH. {Since the 
5 single exon probes were first obtained by PCR amplification 
from genomic DNA, we were of course additionally in 
possession of an even larger set of single base incremented 
fragments of each of the 13,114 single exon probes, each 
fragment corresponding to an extension product from one of 

10 the two amplification primers.) 

The structures of the 13,114 unique single exon 
probes are clearly presented in the Sequence Listing as SEQ 
ID Nos.: 1 - 13,114 . The 16 nt 5' primer sequence and 16 
nt 3' primer sequence present on the amplicon are not 

15 included in the sequence listing. The sequences of the 
exons present within each of these probes is presented in 
the Sequence Listing as SEQ ID Nos.: 13,115 - 26,012, 
respectively. It will be noted that some amplicons have 
more than one exon, some exons are contained in more than 

20 one , ampl i con . 

As detailed in Example 2, expression was 
demonstrated by disposing the amplicons as single exon 
probes on nucleic acid microarrays and then performing two- 
color fluorescent hybridization analysis; significant 

25 expression is based on a statistical confidence that the 
signal is significantly greater than negative biological 
control spots. The negative biological control is formed 
from spotted DNA sequences from a different species. Here, 
32 sequences from E.Coli were spotted in duplicate to give 

30 a total of 64 spots. 

For each hybridisation (each slide, each colour) 
. the median value of the signal from all of the spots is 
determined. The normalised signal value is the arithmetic 
mean of the signal from duplicate spots divided by the 

35 population median. 
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Control spots-are eliminated if there is more 
that a five-fold difference between each one of the 
duplicate spots raw signals. 

The median of the signal from the remaining 
5 control spots is calculated and all subsequent calculations 
are done with normalised signals. 

Control spots having a signal of greater than 
median + 2.4 (the value 2.4 is roughly 12 times the 
observed standard deviation of control spot populations) 
10 are eliminated. Spots with such high signals are considered 
to be w outliers" . 

The mean and standard deviation of the modified 
control spot populations -are calculated. 

The mean + 3x the standard deviation (mean + 
15 (3*SD) ) is used as the signal threshold qualifier for that 
particular hybridisation. Thus, individual thresholds are 
determined for each channel and each hybridisation. 

This means that, assuming that the data is 
distributed normally, there is a 99% confidence that any 
20 signal exceeding the threshold is significant. 

The probes and their expression data are 
presented in Table 4, set forth respectively in Example 5. 
Example 5 presents the subset of probes that is 
significantly expressed in the human bone marrow and thus 
25 presents the subset of probes that was recognized to be 
useful for measuring expression of their cognate genes in 
human bone marrow tissue. 

The sequence of each of the exon probes 
identified by SEQ ID NOS. : 13,115 - 26,012 was individually 
30 used as a BLAST (or, for SWISSPROT, BLASTX) query to 
identify the most similar sequence in each of dbEST, 
SwissProt (BLASTX) , and NR divisions of GenBank. Because 
the query sequences are themselves derived from genomic 
sequence in GenBank, only nongenomic hits from NR were 
35 scored. 
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The smallest in value of the BLAST (or BLASTX) 
expect ("E" ) scores for each query sequence across the 
three database diversions was used as a measure of the 
"expression novelty" of the probe's ORF. Table 4 is sorted 
5 in descending order based on this measure , reported as 

"Most Similar (top) Hit BLAST E Value". Those sequences for 
which no "Hit £. Value" is listed are those exons which were 
found to have no .similar sequences. 

As sorted, Table 4 thus lists its respective 

10 probes (by "AMPLICON SEQ ID NO.:" and additionally by the 
SEQ ID NO:, of the exon contained within the probe: "EXON 
SEQ ID NO.:") from least similar to sequences known to be 
expressed {i.e., highest BLAST E value), at the beginning 
of the table, to most similar to sequences known to be 

15 expressed (i.e., lowest BLAST E value), at the bottom of 
the table. 

Table 4 further provides, for each listed probe, 
the accession number of the database sequence that yielded 
the "Most Similar (top) Hit BLAST E Value", along with the 

20 name of the database in which the database sequence is 
found ("Top Hit Database Source") . 

Table 4 further provides SEQ ID NOS. 
corresponding to the predicted amino acid sequences where 
they have been determined for the probe and exon nucleotide 

25 sequences. These are set out as PEPTIDE SEQ ID NOS.:. The 
peptide sequences for a given exon are predicted as 
follows: Since each chip exon is a consensus sequence drawn 
from predictions from various exon finding programs (i.e. 
Grail, GeneFinder and GenScan) , the multiple initial ORFs 

30 are first determined in a uniform way according to each 

prediction. In particular, the reading frame for predicting 
the first amino acid in the peptide ' sequence always starts 
with the first base of any codon and ends with the last 
base of non-termination codon. Next, for each strand of the 

35 exon, initial ORFs are merged into one or more final ORFs 
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in an exhaustive process based on the following criteria: 
1) the merging ORFs must be overlapping, and 2) the merging 
ORFs must be in "the same frame. 

The Sequence Listing, which is a superset of all 
5 of the data presented in Table 4, further includes, for 

each probe, the most similar hit, with accession number and 
BLAST E value, from the each of the three queried 
databases . 

Table 4 further lists, for each probe, a portion 

10 of the descriptor for the top hit {"Top Hit Descriptor") as 
provided in the sequence database. For those ORFs that are 
similar in sequence, but nonidentical to known sequences 
(e.g., those with BLAST E values between about le-05 and 
le-100) , the descriptor reveals the likely function of the 

15 protein encoded by the probe's ORF. 

Using BLAST E value cutoffs of le-05 (i.e., 1 x 
1CT 5 ) and le-100 (i.e., 1 x 10" 100 ) as evidence of similarity 
to sequences known to be expressed is of course arbitrary: 
in Example 2, supra, a BLAST E value of le-30 was used as 

20 the boundary when only two classes were to be defined for 
analysis (unknown, >le-30; known <le-30) (see also FIG. 8) . 
Furthermore, even when the "Most Similar (Top) Hit BLAST E 
Value" is low, e.g., less than about le-100 — which is 
probative evidence that the query sequence has previously 

25 been shown to be expressed - the top hit is highly unlikely 
exactly to match the probe sequence. 

First, such expression entries typically will not 
have the intronic and/or intergenic sequence present within 
the single exon probes listed in the Table. Second, even 

30 the ORF itself is unlikely in such cases to be present 
identically in the databases, since most of the EST and 
mRNA clones in existing databases include multiple exons, 
without any indication of the location of exon boundaries. 

As noted, the data presented in Table 4 represent 

35 a proper subset of the data present within the attached 
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sequence listing. For each amplicon probe (SEQ ID NOs . : 1 
- 13/114) and probe exon "(SEQ ID NOs.: 13,115 - 26,012, 
respectively), the sequence listing further provides, 
through iterated annotation fields <220> and <223>: 
5 (a) the accession number of the BAC from which 

the sequence was derived ("MAP TO"), thus providing a link 
to the chromosomal map location and other information about 
the genomic milieu of the probe sequence; 

(b) the most similar sequence provided by BLAST 
10 query of the EST database, with accession number and BLAST 

E value for the "hit"; 

(c) the most similar sequence provided by BLAST 
query of the GenBank NR database, with accession number and 
BLAST E value for the "hit"; and 

15 (d) the most similar sequence provided by BLASTX 

query of the SWISSPROT database, with accession number and 
BLAST E value for the "hit". 

20 EXAMPLE 5 

Genome-Derived Single Exon Probes Useful For Measuring 
Expression of Genes in Human Bone marrow 

Table 4 (54 6 pages) presents expression, homology, and ■ 
25 functional information for the genome-derived single exon 
probes that are expressed significantly in human bone 
marrow. 

30 
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1. A spatially-addressable set of single exon nucleic acid 
probes for measuring gene expression in a sample derived 
5 from human bone marrow comprising a plurality single exon 
nucleic probes, said probes comprising any one of the 
nucleotide sequences set out in SEQ ID NOs : 1 - 13,114 or a 
complementary sequence, or a portion of such a sequence. 

10 2. A spatiall-y-addressable set of single exon nucleic acid 
probes as claimed in claim 1 wherein each of said plurality 
of probes is separately and addressably amplifiable. 

3. A spatially-addressable set of single exon nucleic acid 
15 probes as claimed in claim 1 wherein each of said plurality 

of probes is separately and addressably isolatable from 
said plurality. 

4. A spatially-addressable set of single exon nucleic acid 
20 probes as claimed in any of claims 1 to 3 wherein said 

probes comprise any one of the nucleotide sequences set out 
in SEQ ID NOS . : 13,115 - 26,012. 

5. A spatially-addressable set of single exon nucleic acid 
25 probes as claimed in any of claims 1 to 4, wherein each of 

said plurality of probes is amplifiable using at least one 
common primer. 

6. A spatially-addressable set of single exon nucleic acid 
30 probes as claimed in any of claims 1 to 5 wherein the set 

comprises between 50 - 20,000 single exon nucleic acid 
probes . 

7. A spatially-addressable set of single exon nucleic acid 
35 probes as claimed in any of claims 1 to 6, wherein the 
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average length of the single exon nucleic acid probes is 
between 200 and 500 bp. 



8. A spatially-addressable set of single exon nucleic acid 

5 probes as claimed in any of claims 1 to 7, wherein at least 
50% of said single exon nucleic acid probes lack 
prokaryotic and bacteriophage vector sequence. 

9. A spatially-addressable set of single exon nucleic 
10 probes as claimed in any of claims 1 to 8, wherein at 

50% of said single exon nucleic acid probes lack 
homopolymeric stretches of A or T. 

10. A spatially-addressable set of single exon nucleic acid 
15 probes as claimed in any of claims 1-9 characterised in 

that said set of probes is addressably disposed upon a 
substrate . 

11. A spatially-addressable set of single exon nucleic acid 
20 probes as claimed in claim 10 wherein said substrate is 

selected from glass, amorphous silicon, crystalline silicon 
and plastic. 

12. A microarray comprising a spatially addressable set of 
25 single exon nucleic acid probes as claimed in any of claims 

1-11. 

13. A single exon nucleic acid probe for measuring human 
gene expression in a sample derived from human bone marrow 

30 comprising a nucleotide sequence as set out in any of SEQ 
ID NOs.: 1 - 13,114 or a complementary sequence or a 
fragment thereof wherein said probe hybridizes at high 
stringency to* a nucleic acid molecule expressed in the 
human bone marrow. 
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14. A single exon nucleic acid probe as claimed in claim 13 
comprising a nucleotide sequence as set out in any of SEQ 
ID NOs.: 13,115 - 26,012 or a complementary sequence or a 
fragment thereof. 

5 

15. A single exon nucleic acid probe for measuring human 
gene expression in a sample derived from human bone marrow 
which is a nucleic acid molecule having a sequence encoding 
a peptide comprising a peptide sequence as set out in any 

10 of SEQ ID NOs.: 26,013 - 38,628, or a complementary 
sequence or a fragment thereof wherein said probe 
hybridizes at high stringency to a nucleic acid expressed 
in the human bone marrow. 



15 16. A single exon nucleic acid probe as claimed in any one 
of claims 13 to 15 wherein said single exon nucleic acid 
probe comprises between 15 and 25 contiguous nucleotides of 
said SEQ ID NO. 



20 17. A single exon nucleic acid probe as claimed in any one 
of claims 13 to 15, wherein said probe is between 3 - 25 kb 
in length. 

18. A single exon nucleic acid probe as claimed in any one 
25 of claims 13 - 17, wherein said probe is DNA, RNA or PNA. 

19. A single exon nucleic acid probe as claimed in any one 
of claims 13 - 18, wherein said probe is detectably 
labeled. 



30 



20. A single exon nucleic acid probe as claimed in any one 
of claims 13 - 19, wherein said probe lacks prokaryotic and 
bacteriophage vector sequence. 



35 21. A single exon nucleic acid probe as claimed in any one 
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of claims 13 - 20 , wherein said probe lacks homopolymeric 
stretches of A or T. 

22. A method of measuring gene expression in a sample 
5 derived from human bone marrow, comprising: 

contacting the microarray of claim 12, with a first 
collection of detectably labeled nucleic acids, 
said first collection of nucleic acids derived 
from mRNA of human bone marrow; and then 
10 measuring the label detectably bound to each probe of 

said microarray. 



23. A method of identifying exons in a eukaryotic genome, 
comprising : 

15 algorithmically predicting at least one exon from 

genomic sequence of said eukaryote; and then 
detecting specific hybridization of detectably labeled 
nucleic acids to a single exon probe, 
wherein said detectably labeled nucleic acids are derived 
20 from mRNA from the bone marrow of said eukaryote, said 

probe is a single exon probe having a fragment identical in 
sequence to, or complementary in sequence to, said 
predicted exon, said probe is included within a microarray 
according to claim 12, and said fragment is selectively 
25 hybridizable at high stringency. 

24. A method of assigning exons to a single gene, 
comprising: 

identifying a plurality of exons from genomic 
30 sequence according to the method of claim 23; and 

then 

measuring the expression of each of said exons in a 
plurality of tissues and/or cell types using 
hybridization to single exon microarrays having a 
35 probe with said exon, 
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wherein a common pattern of expression of said exons in 
said plurality of tissues and/or cell types indicates that 
the exons should be assigned to a single gene. 

5 25. A nucleic acid sequence as set out in any of SEQ ID 
NOs : 1 - 26,012 which encodes a peptide. 

26. A peptide encoded by a sequence as set out in any of 
SEQ ID Nos: 1 - 26,012. 

10 

27. A peptide comprising a sequence as set out in any of 
SEQ ID Nos: 26,013 - 38,628. 
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Top Hit Descriptor 


GENOME POLYPROTEIN [CONTAINS: CAPSID PROTEIN C (CORE PROTEIN); MATRIX PROTEIN 
(ENVELOPE GLYCOPROTEIN M); MAJOR ENVELOPE PROTEIN E; NONSTRUCTURAL PROTEINS 
NS1, NS2A, NS2B, NS4A AND NS4B; HELICASE (NS3); RNA-DIRECTED RNA POLYMERASE (NS5)J 
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Dictyostellum discoideum non-LTR retrotransposon TRE5-B, polyprotein (gag) and group-specific antigen 
(pd) genes, complete cds 


Human hereditary haemochromatosis region, histone 2A-like protein gene, hereditary haemochromatosis 
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tq77a12.x1 NCI_CGAP_Ut1 Homo sapiens cDNA clone 1MAGE:2214814 3" similar to gb:X14723 
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ID 

E 

\ 

. E 

s 

c 
& 
a 

c 

b ! 

1| 

© c 

S 5 
ll 

a 5 

1 c 

2 i 
x tt 

h-' - 

g s 

X c 

§ 1 

II 

— a 

II 
11 

LU > 


Campylobacter jejuni kanamycin phosphotransferase (aphA-7) gene, complete cds j 


Homo sapiens chromosome 21 segment HS21 C1 02 j 


ws32e10.x1 NCI_CGAP_GC6 Homo sapiens cDNA clone IMAGE:2498922 3' similar to SW :TRXB__HUMAN 
Q16881 THIOREDOXIN REDUCTASE ; 


Homo sapiens hypothetical protein FLJ20707 (FLJ20707), mRNA j 


DIHYDROPYRIMIDINASE (DHPASE) (HYDANTOINASE) (DHP) I 
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Mus musculus desmin gene \ 
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Eiaels oleifera sesquiterpene synthase mRNA, complete cds 


pea seed-borne mosaic virus complete genome 


pea seed-borne mosaic virus complete genome 


Homo sapiens G-protein coupled receptor 14 (GPR14) gene, complete cds 


Homo sapiens mRNA for KIAA0874 protein, partial cds 


Arabidopsis thaliana DNA chromosome 4, contig fragment No. 63 


Arabidopsis thaliana DNA chromosome 4, contig fragment No. 63 


| CONJUGAL TRANSFER PROTEIN TRBE PRECURSOR 


| Homo sapiens LHX3 gene, intron 2 
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Top Hit Descriptor 


Mus musculus subtllisin-like serine protease LPC (PC7) gene, exons 1 to 9, partial cds j 


MR0-FT0 75-0509Q0-203-g06_1 FT01 75 Homo sapiens cDNA j 


Homo sap ens LHX3 gene, intron 2 j 


Rattus rattus cardiac AE3 gene, exons 1 -23 J 


Arabidops s thailana DNA chromosome 4, conb'g fragment No. 21 I 


I Homo sap ens post-synaptic density 95 (DLG4) gene, complete cds ] 


T.pinnatum chloropiast rbcL gene, partial I 


| HISTIDINE-RICH PROTEIN PRECURSOR (CLONE PFHRP-lll) I 


|HISTIDINE-RlCH PROTEIN PRECURSOR (CLONE PFHRP-lll) j 


JHISTIDINE-RICH PROTEIN PRECURSOR (CLONE PFHRP-lll) I 


| Human extracellular calcium-sensing receptor mRNA, complete cds I 
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| Homo sapiens zinc finger protein ZNF191 (ZNF1 91) gene, complete cds f 


jD.hydel ayl repeat cluster DNA, fragment D J 
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gb|M87935[HUMAALU472 Human carcinoma cell-derived Alu RNA transcript, (rRNA); gb:J04970 
CARBOXYPEPTIDASE M PRECURSOR (HUMAN); 
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Top Hit Descriptor 


Homo sapiens Noteh3 (NOTCH3) gene, exons 26, 27, and 28 | 


Yaba monkey tumor virus DNA, BamH1 restriction fragment E, M and partial C, partial and complete cds 


|D(2) DOPAMINE RECEPTOR j 
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MACROPHAGE-STIMULATING PROTEIN RECEPTOR PRECURSOR (MSP RECEPTOR) (P185-RON) 
(CDW136) (CD136 ANTIGEN) 


| Strongylocentrotus purpuratus kinesin light chain isoform 2 mRNA, complete cds | 


|Strongyiocentrotus purpuratus kinesin light chain isoform 2 mRNA, complete cds j 


Homo sapiens partial LM01 gene for LIM domain only 1 protein, exon 1 ] 


| SEGMENTATION PROTEIN FUSHI TARAZU j 


| SEGMENTATION PROTEIN FUSHI TARAZU | 


Homo sapiens genes for leukotriene B4 receptor BLT2, leukotriene B4 receptor BLT1, complete cds 


| PEROXISOMAL MEMBRANE PROTEIN PER9 (PEROXIN-3) j 


| RC2-FN0094-1 90700-0 1 7-d08 FN0094 Homo sapiens cDNA | 


|lf08f07.x1 NCI„CGAP_Pr28 Homo sapiens cDNA clone IMAGE:2095621 3' j 


Homo sapiens nuclear factor (erythroid-derived 2)-like 3 (NFE2L3), mRNA | 


| Mus musculus cGMP-inhibited phosphodiesterase (Pde3a), mRNA | 


|RC1-HT0375-030500-O1 5-o03 HT0375 Homo sapiens cDNA \ 


I Haemophilus influenzae Rd section 16 of 163 of the complete genome j 


| Homo sapiens chromosome 21 segment HS21 C067 j 


| Homo sapiens chromosome 21 segment HS21 C067 | 


[Rattus norvegicus cenexin 2 mRNA, partial cds | 


Homo sapiens low density lipoprotein receptor-related protein II (LRP2) gene, exon 1 and partial cds 


| Homo sapiens gene for histamine H2 receptor, promoter region and complete cds || 


|Synechocystis sp. PCC6803 complete genome, 1 3/27, 1 576593-1 71 9643 | 


| Legionella pneumophila gene for iron superoxide dismutase, complete cds | 


Chlamydia trachomatis strain KAJW31/Cx major outer membrane protein (omp1) gene, complete cds 


Top Hit 
Database 
Source 


INT 1 


NT 


ISWISSPROT j 


|EST_HUMAN | 


NT 


SWISSPROT 


INT | 


INT | 


INT | 


ISWISSPROT | 


ISWISSPROT | 


NT 


[SWISSPROT | 


|EST_HUMAN | 


|EST_HUMAN | 


|NT | 


|nt j 


|EST_HUMAN | 


INT I 


INT | 


I NT I 


INT | 


NT 


|NT | 


INT | 


INT 1 


NT 


Top Hit Acession 
No. 


AF058895.1 | 


AB025319.1 


P20288 I 


AW139713.1 


U38813.1 


Q04912 


|L10234.1 j 


L1 0234.1 I 


AJ277661.1 I 


P02835 I 


IP02835 i 


1 

CO 

o 
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Q01497 | 


BE837779.1 | 


|AI420623.1 j 


i 11421663' 


9055303 


|BE1 5761 7.1 i 


|U32701.1 j 


|AL1Q3257.2 | 


|AL163267.2 j 
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AF065440.2 


IAB023486.1 
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]D1 2922.1 
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Most Similar 
(Top) Hit 
BLAST E 
Value 


6.0E-01 1 


6.0E-01 


| 6.0E-01 1 


| 6.0E-01| 


6.0E-01 


8.0E-01 


| 6.0E-01 1 


i 6.0E-01 1 


6.0E-01 1 


8.0E-01| 


6.0E-01 1 


8.0E-01 
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o 
to 


i 6.0E-01 [ 
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CO 


! 6.0E-01 


3 9 

J UJ 
3 O 
3 CO 


| e.oE-01 1 


[ 5.9E-01 1 


3 

CD 
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S 
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[ 5.9E-01 1 


5.9E-01 


| 5.9E-01| 


| 5.9E-01 1 


| 5.9E-01) 


5.9E-01 


Expression 
Signal 


1.61 1 


§ 
d 


! 2-141 


I 2.22| 


2.68 


0.67 


0.78 1 


0.78| 


5.51 1 


4.55| 


4.55| 


1.84 


1.66 1 


0.48I 


I 2.79| 


1.87 


! 4.71 1 


■M- 

m 


! 0.07I 


LO 

CD 


| 4.95| 


§ 


1.45 


I 2.44[ 


I 0.46| 


I 0.48| 


0.89 


ORFSEQ 
ID NO: 




30187 


i 31327| 


| 31547| 


33022 


33161 


33539| 


i 33540| 


| 33898) 


j 34847 1 


j 34848 1 


36569 






| 38345| 


S 31788 


j 31522 




i 27005| 


| 29256) 


t 29257| 




32943 


i 33803 1 


CO 

1 


[ 35375| 


36314 


Exon 
SEQ ID 

NO: 


| 17246| 


17308 


( 184681 


1 186141 


19746 


19872 


20211| 


20211| 


20539| 


21 430 | 


21430| 


23107 


23558| 


23671 | 


| 24760' 


I 25322 


25777| 


2571 5 | 


14053 1 


16337 1 


16337! 


17279 1 


19667 


20447! 


21301) 


I 21951 1 


22853 


Probe 


SEQ ID 
NO: 


4217 


4279 


| 5353| 


| 5514| 


§ 


6818 
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I 7577| 


| 8461 | 


| 8461 1 


10182 


| 10636| 


o> 
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| 11878! 
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| 1002| 


| 3283 1 


| 3283 1 


| 4250 | 
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Top Hit Descriptor 


MITOCHONDRIAL TRIFUNCTIONAL ENZYME ALPHA SUBUNIT PRECURSOR (TP-ALPHA) 
[INCLUDES: LONG-CHAIN ENOYL-COA HYDRATASE ; LONG CHAIN 3-HYDROXYACYL-COA 
DEHYDROGENASE] 
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CD 


NITRATE REDUCTASE fNADPH] (NR) j 


QV4-BT0536-271 299-059-h04 BT0536 Homo sapiens oDNA J 


LAMININ ALPHA-2 CHAIN PRECURSOR (LAMININ M CHAIN) (MEROSIN HEAVY CHAIN) | 


LAMININ ALPHA-2 CHAIN PRECURSOR (LAMININ M CHAIN) (MEROSIN HEAVY CHAIN) j 


Wi37g04.x1 NCI_CGAP_Ut1 Homo sapiens cDNA clone IMAGE:2427126 3' similar to gb:Ml3452 LAMIN A 
(HUMAN); 


Homo sapiens HLA class III region containing tenascln X (tenasdn-X) gene, partial cds; cytochrome P450 21- 
hydroxylase (CYP21B), complement component C4(C4B) G11, helicase (SKI2W), RD, complement factor B 
(Bf)', and complement component C2 (C2) genes,> 


Brassica oleracea var. capitata phosphollpase D2 (PLD2) gene, complete cds \ 


Brassica oleracea var. capitata phospholipase D2 (PLD2) gene, complete cds j 


Homo sapiens protein tyrosine phosphatase, receptor-type, zeta polypeptide 1 (PTPRZ1) mRNA I 


Homo sapiens protein tyrosine phosphatase, receptor-type, zeta polypeptide 1 (PTPRZ1 ) mRNA j 


Homo sapiens secreted C-type lectin precursor (LSLCL) gene, complete cds j 


Mycoplasma genitalium section 9 of 51 of the complete genome j 


zu42h1Zy5 Soares ovary tumor NbHOT Homo sapiens cDNA clone JMAGE:74071 1 5" j 
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zr42g09.r1 Soares JMhHMPu_S1 Homo sapiens cDNA clone I MAGE: 6661 12 5' | 


7e73c1Zx1 NCI_CGAP_Pr28 Homo sapiens cDNA clone IMAGE:32881 18 3' similar to gb:J02783 
PROTEIN DISULFIDE ISOMERASE PRECURSOR (HUMAN); 


7e73c1Zx1 NCI_CGAP_Pr28 Homo sapiens cDNA clone IMAGE:32881 18 3* similar to gb:J02783 
PROTEIN DISULFIDE ISOMERASE PRECURSOR (HUMAN): 


Roridula gorgonias ribulose 1 ,5-bisphosphate carboxylase (rbcL) gene, partial cds; chloroplast gene for 
chloroplast product 


7q71c1Zx1 NCI_CGAP_Lu24 Homo sapiens cDNA clone IMAGE: 3' similar to contains element MER29 
repetitive element ; 


7q71c1Zx1 NCI_CGAP_Lu24 Homo sapiens cDNA clone IMAGE: 3' similar to contains element MER29 
repetitive element ; 


Top Hit 
Database 
Source 


SWISSPROT 


EST.HUMAN | 


SWISSPROT | 


ESTJHUMAN | 


SWISSPROT | 


SWISSPROT | 


EST_HUMAN 


IN 


NT | 


NT I 


NT ! 
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ESTJHUMAN | 


ESTHUMAN | 
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LU 


Top Hit Acesslon 
No. 


Q64428 


BF572536.1 || 


P36858 | 


AW373694.1 j 


ID 

i 

CD 

a 


Q60675 | 


AI858398.1 


AF019413.1 


AF1 13919.1 | 


AF1 1391 9.1 j 


4506328 | 


4506328 1 


AF0876S8.1 ] 


U39687.1 | 


§ I 

co o 
< < 


AA1 93672.1 I 


AA19367Z1, 1 | 


BE645620.1 ' - 


BE645620.1 


L01950.2 • 


BF433956.1 


BF433958.1 


Most Similar 
(Top) Hit 
BLAST E 
Value 


5.4E-01 


5.4E-01 1 
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5.3E-01 


5.3E-01 1 
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I 
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1.78 


1.93| 


2.191 
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ID NO: 
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3841 7 1 
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SEQ ID 
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Top Hit Descriptor 


Emericella nidulans NEMPA (nempA) gene, mitochondrial gene encoding putative mitochondrial protein, 
complete cds 


Emericella nidulans NEMPA (nempA) gene, mitochondrial gene encoding putative mitochondrial protein, 
complete cds 


Murine cytomegalovirus e1 protein gene, complete cds j 


nh04h05.s1 NCI_CGAP_Thy1 Homo sapiens cDNA clone IMAGE:943353 similar to contains Alu repetitive 
element;contains element L1 repetitive element ; 


Xylella fastidiosa, section 1 77 of 229 of the complete genome I 
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oo76bOB.s1 NCI_CGAP_Kld5 Homo sapiens cDNA clone IMAGE:1 572087 3' similar to gb:M38341 ADP- 
RIBOSYLATION FACTOR 4 (HUMAN); 


ATRIAL NATRIURETIC PEPTIDE RECEPTOR B PRECURSOR (ANP-B) (ANPRB) (GC-B) (GUANYLATE 
CYCLASE) 


ATRIAL NATRIURETIC PEPTIDE RECEPTOR B PRECURSOR (ANP-B) (ANPRB) (GC-B) (GUANYLATE 
CYCLASE) 


Glycine max acetyl-CoA carboxylase (accB-1) gene, complete cds; nuclear gene for chloroplast product 


Glycine max acetyl-CoA carboxylase (accB-1 ) gene, complete cds; nuclear gene for chloroplast product 
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PUTATIVE VITELLOGENIN RECEPTOR PRECURSOR (YL) I 


IL5-HT0730-100500-075-g05 HT0730 Homo sapiens cDNA J 


IL5-HT0730-100500-075-g05 HT0730 Homo sapiens cDNA I 


601 126068F1 NIH_MGC_9 Homo sapiens cDNA clone IMAGE:2989865 5" | 


Human thiopurine methyltransferase (TPMT) gene, exon 10 and complete cds I 


Human thiopurine methyltransferase (TPMT) gene, exon 10 and complete cds j 


HUM105F03B Clontech human fetal brain polyA+ mRNA (#6535) Homo sapiens cDNA clone GEN-105F03 


j601 142105F1 NIH_MGC_14 Homo sapiens cDNA clone IMAGE:3505S93 5' 


Deintwoccus radiodurans R1 section 68 of 229 of the complete chromosome 1 ] 


Top Hit 
Database 
Source 


NT 


NT 


z 


EST HUMAN 


NT ! 


EST_HUMAN 


EST HUMAN 


SWISSPROT 


SWISSPROT 
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SWISSPROT 
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EST_HUMAN 
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NT 


EST HUMAN 


EST HUMAN 
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TopHitAcession. 
No. 


U62332.1 


- 

U62332.1 


L07320.1 | 


AA493577.1 


AE004031.1 | 


AA932237.1 


AA932237.1 


P55202 


P55202 


AF162283.1 


CD 
LL 
< 


AI91 5634.1 


A191 5634.1 


P98163 


BE1 85449.1 


BE1 85449.1 


BE272325.1 


AF019369.1 


AF019369.1 


D53316.1 


BE31 1420.1 


AE001931.1 


Most Similar 
(Top) Hit 
BLAST E 
Value 


4.6E-01 


4.6E-01 


4.6E-01 1 


4.6E-01 


4.6E-01 1 


4.6E-01 


4.6E-01 


4.6E-01 


4.6E-01 


4.6E-01 


1 


4.6E-01 


4.6E-01 


4.6E-01I 


4.6E-01 1 


| 


4.6E-01 


1 

T 


3 

CD 


I 


- 


4.5E-01 


Expression 
Signal 


1.43 


1.43 


0.53 1 


0.78 


0.53 


0.47 


0.47 


0.99 


0.99 


0.89 


0.89 


1.621 


1.62| 


2.29 1 


4.94 1 


4.94 1 


3.88 


4.41 1 




1.69 


0.92 


1.34 


ORF SEQ 
ID NO: 


33474 


33475 


33759 | 


34333 


< 

i 

L 
C 


35490 


35491 


9 

§ 


36049 


36418 


36419 


367361 


36737j 




378181 


37818 


37930 


374761 


37477 




j 
1 


27947 


Exon 
SEQ ID 
NO; 


20156 


20156 


25681) 


20940 


21029 


22065 


22065 


22599 


22599 


5> 

a 


22951 


23259 


23259] 


24285 1 


242931 


24293 


24388I 


23954 


23954 


25208 


14748 


14050 


Probe 
SEQ ID 
NO: 


6932 


<N 

§ 


I 74411 


8001 


co c 
1! 


O CD 

§ i 


6606 


9656 


<D 

1 


10024 


10024 


10335 


10335 


1 


11343 


11343 


114451 


117991 


11799 


1 

12449 


j 1718 


1926 



56/546 



WO 01/57276 



PCT/US01/00668 



Si 

S u_ 

eg z 

CM = 

"5 w 



- " 

°E 

go 

© DC 

'5. a 
S uj 
o t 
E (0 

P 
St! 



x > 



5* 
22 



si § 



111= 



S .5> 



UJ o 



2 9 
o 



57/546 



WO 01/57276 



PCT/US01/00668 



Top Hit Descriptor 


Rattus norvegicus SynGAP-b mRNA, complete cds j 


Rattus norvegicus SynGAP-b mRNA, complete cds J 


in b 

%i 
Is 

° c 

0 x 

g « 

•S I 
« S 

° c 

I 

1 2 

8-5 
a. c 

O 3 

0 I 

z u 
*? 

2, c 
FT <r 


HISTIDINE-RICH GLYCOPROTEIN PRECURSOR 


HISTIDINE-RICH GLYCOPROTEIN PRECURSOR j 


mucin [rats, Sprague-Dawley, eulfur-dioxide-treated trachea! epithelium, mRNA Partial, 390 nt] ( 


AV720408 GLC Homo sapiens cDNA clone GLCCSC12 5' | 


qi62h11.x1 NCl_CGAP_Brn25 Homo sapiens cDNA clone IMAGE:186112S 3' similar to TR:Q29168 Q29168 
UNKNOWN PROTEIN ; 


qi62h11.x1 NCI_CGAP_Bm25 Homo sapiens cDNA clone IMAGE:1861125 3' similar toTR:Q29168 Q29168 
UNKNOWN PROTEIN ; 


xc27e08.x1 NCI_CGAP_Co18 Homo sapiens cDNA clone IMAGE:2585510 3' similar to TR:095154 095154 
AFLATOXIN B1 -ALDEHYDE REDUCTASE. ; 


ae85dl1.s1 Stratagene schizo brain S11 Homo sapiens cDNA clone IMAGE:970965 3' similar to gb:Ml6038 
TYROSINE-PROTEIN KINASE LYN (HUMAN); 


Helicobacter pylori 26695 eection 49 of 134 of the complete genome J 


2l69a03.s1 Stratagene coton (#937204) Homo sapiens cDNA ctone 1MAGE:509836 3* j 


HIV-1 Isolate C8107v6 from USA, envelope glycoprotein (env) gene, partial cds j 


hh03c08.x1 NCI CGAP„Kid1 1 Homo sapiens cDNA clone IMAGE:2954222 3' similar to 
SW:MSH6_HUMAN P52701 DNA MISMATCH REPAIR PROTEIN MSH6 ; 


ZINC FINGER X-CHROMOSOMAL PROTEIN 1 


qo39f09.x1 NCl_CGAP_Lu5 Homo sapiens cDNA clone IMAGE:191 0921 3' | 


GLYCOPROTEIN B PRECURSOR (GLYCOPROTEIN 14) I 


* 

<r 

* 5 
O i 

to « 

8* 

°- c 
3 f 

P s 

0- g 

LU - 
O * 

Si 

UJ 5 

li 

? S 

ui 1 


beta -HKA=H ,K- ATPase beta-subunit [rats, Genomic, 8983 nt segment 2 of 2] | 


Mus musculus sodium channel, type X, alpha polypeptide (Scn1 0a), mRNA ! 


Autographa oalifornica nucleopolyhedrovlrus, complete genoms I 


UV EXCISION REPAIR PROTEIN PROTEIN RAD23 HOMOLOG A (HHR23A) ] 


Callithrbcjacchus MW/LW opsin gene, upstream flanking region J 


Callithrix jacchus MW/LW opsin gene, upstream flanking region I 


[QV4-SN0024-200400-1 83-b01 SN0024 Homo sapiens cDNA J 


Top Hit 
Database 
Source 


NT I 


NT ( 


EST_HUMAN I 


SWISSPROT I 


SWISSPROT | 


IN 


EST_HUMAN | 


EST HUMAN 


i 

£ 


EST HUMAN 


EST_HUMAN 


NT I 


3 

5 

- U) 
r - UI 


NT ! 


EST HUMAN 


SWISSPROT t 


EST_HUMAN | 


SWISSPROT ! 


SWISSPROT | 


NT 


NT 


NT 


SWISSPROT 


NT 


NT 


EST HUMAN 


c 
o 

10 

So' 
X 

I 


AF058790.1 ! 


AF058790.1 | 


BF056726.1 ! 


P04929 | 


§ 


S65019.1 | 


AV720408.1 | 


AI198413.1 


AH 9841 3.1 


AW080795.1 


AA776132.1 


AEG00571.1 | 


AA056427.1 | 


AF1 12540.1 | 


AW612578.1 


062836 


A1268650.1 1 


P28922 1 


P35590 


S76404.1 


6677874 


9627742 


P54725 


AF1 5521 8.1 


AF155218.1 


|AW866550.1 


Most Similar 
(Top) Hit 
BLAST E 
Value 


4.4E-01I 


9 

UJ 


9 J 
^ H 
v - 


4.4E-01 


I 


1 


1 


4.4E-01 


4.4E-01 


1 

«*■ 


4.4E-01 


9 < 
^ L , 


li 


4.4E-01 1 


4.4E-01 


o 


§ 


4.4E-01 


4.4E-01 


4-.4E-01 


I 4.4E-01 


! 4.4E-01 


9 


9 
uu 

CO 
«*' 


4.3E-01 


| 4.3E-01 


Expression 
Signal 


1.361 


1.361 


5 r 


1.63 


1.631 




1.82| 


1.42 


1.42 


O) 


1.17 


0.95I 


0.74 


0.72i 


0.561 


1.21 




3.91 


L 507 


1.27 


5.76 


CO 
CO 

c\i 


1.45 


2.49 


2.49 


0.96 


ui n 
"1 


29298 


29299 


g 

s 

CM 


31505 


31506 


32045 


32064 I 


32356 


1 32357 


32680 




1 

to 


36509 


35913 


CO 

s 

10 

CO 


36038 


36725 




36872i 


37157 1 


CO 


31727 




CO 

1 


i 26420 


i 27616 


Exon 
SEQ ID 

NO: 


16377 


16377 


16381 


18594 


18594 


18864 


188821 


5 


in 

5 


19436 


I 19527 


20587 


22081 


i 
St 


22601 


22589 


23245 


23246 


23379 


236621 


25196 


25466 


25535 


13484 


13484 


14639 


Probe 
SEQ ID 

NO: 


3326 


I 3326 


[ 3330 




5494 


5772 


1 

lO 


S 

o 


6064 


6368 


6462 


| 7627 


9116 


9506 


9538 


3 

CO 


10321 


10322 


10457 


10740' 


12432 


12861 


12867 




5 


| 1607 



58/546 



WO 01/57276 



PCT/US01/00668 



x 



° o £ 
a t a 



egion 


egion 


c 


cn 
c 


1 


3 

£ 


£ 




1 


I 


Q- 
3 




o" 

s 

Di 


§ 




opsin * 


Q. 


i MW/LW < 


, MW/LW i 








i 



Si 



5 o 



si 



J 52 

55 



2 

I 



ism 
2? 85 

§ 

4) CD 



fog 



59/546 



WO 01/57276 



PCT/US01/00668 




60/546 



WO 01/57276 



PCT/US01/00668 



D 



o o £ 



(0 



5 * w 
£ X K g 

2 S3 | 
O O CD 

s 

ii 



u 

si 

11 



if 



si 

- 4) 



a 
" 5 



(A 

§ : 

u 

2 



II 
ii 

! 



75" 



9 « 



61/546 



WO 01/57276 



PCT/US01/00668 




62/546 



WO 01/57276 



PCT/US01/00668 



° © £ 
to o a 



II 

2.. 



l| 

it 

S = 
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s 
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I 
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CO 01 
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§ ■§ S3 

to -b <» 
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1 "B 



£ Q 
O 



• Q 



!o9 



63/546 



WO 01/57276 



PCT/US01/00668 



Top Hit Descriptor 
EST21 71 5 Adrenal gland tumor Homo sapiens cDNA 5' end 


to 

s 

o 

ili 
o 
< 

2 
o 
| 

< 
z 

D 
<n 

If 

§ s 
x f 
5 j 

Q-. E 

ii 

5 o 

O) CN 
CO 2 

LL. O 

Z.S 

I! 

CO £ 

Si? 

<3 9 

O) CO 

Ii 


Neisseria meningitidis serogroup B strain MC58 section 50 of 206 of the complete genome J 


Homo sapiens interferon-induced protein p78 (MX1) gene, complete cds j 


Homo sapiens chromosome 21 segment HS21 CO 78 J 


Chicken ( W hite leghorn) delta-1 and delta-2 crystallin genes, complete cds j 


Mus saxicola haptoglobin mRNA, complete cds ] 


Homo sapiens tumor endothelial marker 7 precursor (TEM7), mRNA j 


601483887F1 NIH_MGC_69 Homo sapiens cDNA clone IMAGE:3886652 5" | 


601483887F1 NIH_MGC_69 Homo sapiens cDNA clone IMAGE:3886652 5' j 


ya50a07.r3 Soares fetal liver spleen 1NFLS Homo sapiens cDNA clone 1MAGE:66324 5' \ 


Homo sapiens chromosome 1 2 open reading frame 4 (C1 20RF4), mRNA j 


Homo sapiens chromosome 12 open reading frame 4 (C120RF4), mRNA | 


0k43b11 .81 NCI_CGAP_Lei2 Homo sapiens cONA clone IMAGE:1 516701 3' J 


Gallus gallus mRNA for beta-carotene 1 5,1 5'«dioxygenase (bCDO gene) | 


mouse ig germline alpha membrane exons region j 


qt46b07.x1 Soares Jetal_lung_NbHL19W Homo sapiens cDNA clone IMAGE:1 950997 3' | 


Rabbit mRNA for fast skeletal muscle myosin heavy chain (MHC) j 


Homo sapiens partial LIMD1 gene for UM domains containing protein 1 and KIAA0851 gene J 


Homo sapiens partial LIMD1 gene for LIM domains containing protein 1 and K1AA0851 gene | 


Bovine mRNA for terminal deoxynucleotidyltransferase (TdT) (EC 2.7.7.31 ) j 


oo46d03.s1 NCI_CGAP_Lu5 Homo sapiens cDNA clone IMAGE:1 569221 3' similar to gb:M77698 
TRANSCRIPTIONAL REPRESSOR PROTEIN YY1 (HUMAN); 


Mus musculus retinoblastoma 1 (Rb1), mRNA ] 


Human heart/skeletal muscle ATP/ADP tranelocator {ANT1 ) gene, complete cds I 


ChSamydophila psittaci partal omp1 gene for outer membrane protein 1 | 


DKFZp762K075_ri 762 (synonym: hmel2) Homo sapiens cDNA clone DKFZp762K075 5' J 


Homo sapiens N F2 gene | 


Human mibp gene, partial cds ) 


yd03eO5.r1 Soares Infant brain 1NIB Homo sapiens cDNA clone IMAGE:24443 5* I 


yd03e05.r1 Soares infant brain 1N IB Homo sapiens cDNA clone !MAGE:24443 5' J 


CO 

I 

iii 
o 

1 

I 
< 

z 

1 

i 

o 

E 
o 
X 

8 
o 

i 

o 

o 
z 

s 

e 

CO 
CO 
TO 
J= 


CO 
CO 

§ 

CN 
LU 

o 

1 

CD 

s 

t> 

2 
9 

to 

I 

X 

8 
o 

a. 1 

5 
3 

o 
z 

5 

2 
to 
to 

2 


Top Hit 
Database 
Source 

EST HUMAN 


EST HUMAN 
EST HUMAN 


NT | 


NT | 


NT I 


NT ] 


NT . | 


NT j 


EST HUMAN | 


ESTJHUMAN | 


EST_HUMAN | 


NT | 


NT | 


EST_HUMAN | 


NT | 


NT | 


ESTJHUMAN | 


NT | 


NT | 


NT I 


NT | 


EST HUMAN 


NT | 


NT | 


NT I 


ESTJHUMAN | 


INl 


NT | 


ESTJHUMAN | 


EST_HUMAN | 


LU 


EST_HUMAN | 


c 
o 

I CO 

t 3 


AI218707.1 
AW878037.1 


AE002408.1 I 


AF135187.1 | 


AL163278.2 [ 


M1 0806.1 l 


L1 0353.1 j 


11 325843 | 


BE873743.1 I 


BE873743.1 I 


T66802.1 j 


11436739| 


114367391 


AA9029121 ( 


AJ271386.1 | 


K00691.1 | 


AI336411.1 | 


X0 5958.1 ! 


AJ297357.1 


AJ297357.1 j 


X041221 ] 


AA973540.1 


6677678| 


J04982.1 | 


AJ243525.1 \ 


JN 

Ij 
< 


Y1 8000.1 j 


U89241.1 | 


T80255.1 1 


T80255.1 


AW590184.1 


AW590184.1 1 


Most Similar 
(Top) Hit 
BLAST E 
Value 

3.7E-01 


3.7E-01 
3.7E-01 


3.7E-01 1 


3.7E-01 1 


3.7E-01 1 


3.7E-01] 


3.7E-01 ] 


3.7E-01 1 


3.7E-01| 


3.7E-01 1 


3.7E-01| 


3.7E-01] 


3.7E-01| 


3.7E-01 1 


3.7E-01 1 


3.7E-01 1 


3.7E-01 1 


3.7E-01 1 


9 

CO 


37E-01 1 


3.7E-01 1 


37E-01 


3 7E-01 1 


9 

Ui 
r- 

co 


37E-01 1 


9 

Ui 
r- 

co 


3.7E-01I 


3:6E-01| 


9 

LU 

to 
to' 


3.6E-01' 


o 

CO 


9 
« 

CO 


Expression 
Signal 

0.7 


en 3 

CD T-' 




1.16| 


1.35| 


0.66 1 


0.77| 


3.48 1 


0.65] 


0.65 1 


0.71] 


1.93 1 


1.93| 


0.69| 


3.78| 


0.52| 


3.05| 


CD 


5 
c\i 


CO 

tsl 


2.34, 


1.53 


3.22 1 


to 
Oi 


4.23| 


1.86| 


Z71| 


11.361 


266' 


cnJ 


8 

to 


6.09 


ORF SEQ 
ID NO: 

29833 


30168 
30255 


30334] 


32132| 


323 53 1 


CO 
CO 

a 

CO 




33667] 


33983 1 


33984] 


34404| 


3B064| 


35065] 


35101] 






36984| 


37689| 


37842| 


37843 1 


37470 1 












31722] 




27320 | 


27321 1 


27951 


27952 


Exon 
SEQ ID 
NO: 

16924 


17286 
17375 


17443| 


18947] 


19141] 


19716| 


19737] 


20320 | 


to 
to 


20618] 


21006| 


21 640 | 


21 640 | 


§ 
eg 


22518| 


23452] 


23492| 


241 59 1 


24316] 


24316| 


i 


24891 


24933 | 


25501 | 


25117] 


25406| 


25447| 


14048| 


CM 

to 

CO 


14352] 


14955 1 


14955 


Probe 
SEQ ID 
NO: 

3864 


4257 
4348 


| 4416 


| 5857 | 


| 6060] 


CO 

to 
to 




| 7350] 


| 7658] 


| 7658] 


| 8069] 


| 8672] 


1 8672| 


i 8708 | 


j 9556 | 


| 10530| 


o 

8 


| 11205| 


| 11369] 


| 11369| 


| 117941 


12014 


[ 12060 1 


| 12136| 


o 


[ 12764) 


| 12829] 


s 


| 1317| 


i 1317] 


| 19311 


| 19311 



64/546 



WO 01/57276 



PCT/US01/00668 




65/546 



WO 01/57276 



PCT/US01/00668 



Top Hit Descriptor 


Homo sapiens lysosomal-assbciatecl membrane protein 2 (LAMP2), transcript variant LAMP2A, mRNA 


Homo sapiens chromosome 21 segment HS21C004 I 


to c 
06 oi 

LO If 

CO c 

Ii 

© 0 
<D ( 

n> r 

1 1 

(0 t 

1 \ 
1 i 

ii 


C.perfringens pic gene for phospholipase C upstream region containing bent DNA fragment \ 


PROBABLE PEPTIDE ABC TRANSPORTER ATP-BINDING PROTEIN Y4TS 1 


MR2-CT0222-21 1099-002-b10 CT0222 Homo sapiens cDNA j 


< 
o 

Ui 

c 

CO 

1 

1 

i 

o 
O 
o 

A 
o 

S 

o> 
o 

CM 
CM 

a 

o 

h- 

O 

s 


60167641 8F1 NIH_MGC_21 Homo sapiens cDNA clone 1MAGE:3958997 5' I 


Arabidopsis thaliana mRNA for SigB, complete cds j 


Methanobacterium thermoautotrophicum from bases 702375 to 71431 1 (section 62 of 148) of the complete 
genome 


Homo sapiens hHb5 gene for hair keratin, exon3 1 to 9 j 


Synechocystjs sp. PCC6803 complete genome, 3/27, 271 600-402289 | 


Escherichia cdi K-12 MG1655 section 225 of 400 of the complete genome ! 


Mus musculus Emr1 mRNA, complete cd3 S 


Homo sapiens myeloid/lymphoid or mlxed-llneage leukemia (trlthorax (Drosophlla) homolog); translocated to, 
10 (AF10), mRNA 


Xl60e11.x1 NCI CGAP Pan1 Homo sapiens cDNA clone IMAGE:2679116 3' similar to gb:K00558 TUBULIN 
ALPHA-1 CHAIN (HUMAN); 


Arabidopsis thaliana DNA chromosome 4, contig fragment No. 36 \ 


Mus musculus mannose receptor, C type 2 (Mrc2), mRNA | 


Homo sapiens GAP-like protein (LOC51309), mRNA | 


Homo sapiens GAP-like protein (LOC51306), mRNA [ 


60181 1060R1 NIH_MGC_48 Homo sapiens cDNA clone IMAGE:4053951 3' ) 


601894653F2 N1H_MGC_1 9 Homo sapiens cDNA clone IMAGE:41 24244 5' | 


Rattus norvegicus ADP-ribosyfation factor-directed GTPase activating protein mRNA, complete cds 


HOMEOBOX PROTEIN HOX-A4 (HOX-1 .4) (MH-3) j 


zr08a09.s1 Stratagene NT2 neuronal precursor 937230 Homo sapiens cDNA clone IMAGE:650872 3' 


nr60d03.s1 NCI_CGAP_Lym3 Homo sapiens cDNA clone 1MAGE:1 1 72357 3' j 


Danio rerio homeobox protein (hoxbSb) gene, complete cds | 


. Top Hit 
Database 
Source 


IN 


z 






SWtSSPROT | 


2 

i 

s 


ESTJ-iUMAN | 


EST_HUMAN | 


NT I 


NT 


NT | 


NT I 


NT I 


NT | 


NT 


EST HUMAN 


INl 


NT | 


NT I 


IN | 


EST_HUMAN | 


ESTJ-IUMAN | 


NT 


SWISSPROT , 


EST HUMAN 


EST HUMAN , 


NT 


Top HitAcesslon 
No. 


4504956 


AL1 63204.2 j 


X1 7550.1 | 


X62825.1 j 


Q53194 I 


AW752901.1 ! 


AW752901.1 | 


BE9O2390.1 | 


AB004293.1 j 


CD 

to 

CO 

I 

< 


d 
cm 


D90901.1 ] 


AE000335.1 


cd 

§ 

CO 

Z) 


11432598 


AW19D229.1 


1 

CO 
Ij 
< 


6678933 1 


7706136, 


7706136 


BF1 29796.1 


BF31 0688.1 


U35776.1 




AA223252.1 


AA642138.1 


AF071 253.1 


Most Similar 
(Top) Hit 
BLAST E 
Value 


3.6E-01 


3.6E-01 1 


3.6E-01 1 


3.6E-01 


3.6E-01 1 


3.6E-01| 


3.6E-01! 


3.6E-01 1 


3.6E-01| 


3.6E-01 


i 

CO 


3.6E-01J 


3.6E-01 ' 


0 
ill 

CD 
CO 


CO 


0 

i 


1 

CO 


3.5E-01 


3.5E-01 


3.5E-01 


3.5E-01 


3.5E-01 


3.5E-01 


3.5E-01 


1 

CO 


s 

CO 


0 

CO 


§ 

8 S 


2.84 


1.12| 


0.93I 


0.54 


16.661 


0.48| 


CO 
CD 


3^4l 


3.26| 


5.83 


s 

CM" 


1.421 


e.35| 


CO 
CM 
CD 


1.97 


3.33 


1.42] 


3.29 1 


1.48i 


1.48] 


4.95| 


CD 
CO 

d 


r- 

LO 
C\f 


1.28! 


1.13 


7.27 


2.18 


ORFSEQ 
ID NO: 


35747 


359591 


361511 


3 

0 

n 


366361 


36750] 


367511 


37769I 


37925| 


37448 














26160 


26238 


26722 


26723 


26785 


27634 


27661 


28328 


28636 




30199 


Exon 
SEQ ID 

NO: 


22320 


225101 


226951 


22673 


231471 


23276I 


232761 


24242 1 


24385| 


23927 


s 


250761 


25083! 


25187| 


25410 


25934 


132261 


133111 


137871 


13787] 


138401 


CD 

i 


14678 


15303! 


§ 
to 


co 

LO 

co 

CO 


17320 


Probe 
SEQID 

NO: 


| 9355 


I 95471 


ism 


1 9824 


102221 


[ 103521 


! 103521 


I 112921 


11442 


I 117721 


I 121741 


1 122471 


j 122571 


| 124171 


I 12770 1 


| 13033 I 


to 


0 

CM 


8 


CD 
CM 
f- 


0 

CO 

1^ 


S 1623! 


1646 


I 22911 


I 261 3! 


CO 

8 


| 4291 i 



66/546 



WO 01/57276 
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repetitive element;contalns element L1 repetitive element ; 


J2498F Human fetal heart, Lambda ZAP Express Homo sapiens cDNA clone J2498 5' similar to TEGT 
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RC4-TN0077-25O800-O1 1-Q.04 TN0077 Homo sapiens cDNA | 


Homo sapiens high-mobility group phosphoprotefn (HMGI-C) gene, exons 1-3, complete cds j 
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jhv51g02Jd NCI_CGAP_Lu24 Homo sapiens cDNA clone IMAGE:3176978 3" | 


GALECTIN-3 (GALACTOSE-SPECIFIC LECTIN 3) (MAC-2 ANTIGEN) (IGE-BINDING PROTEIN) (35 KD 
LECTIN) (CARBOHYDRATE BINDING PROTEIN 35) (CBP 35) (LAMIN IN-BINDING PROTEIN) (LECTIN 
L-29)(CBP30) 


]ob71g0Zs1 NCLCGAP_GCB1 Homo sapiens cDNA clone I MAGE: 1336850 3' | 


'Rhizobium leguminosarum sym plasmid pRLSJI nodX gene J 


Homo sapiens aldehyde oxidase 1 (AOX1 ), mRNA | 


| Pyrococcus horikoshii OT3 genomic DNA, 287001 -544000 nt. position (2/7) j 
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Rattus norvegicus EH domain binding protein Epsin mRNA, complete cds j 


|Arabidopsis thaliana DNA chromosome 4, contig fragment No. 61 j 


IFusarium poao virus 1 RNA2 putative RNA dependent RNA poiymerase gene, complete cds j 


| P. vulgaris arc5-1 gene j 


| LACTOSE PERMEASE (LACTOSE-PROTON SYMPORT) (LACTOSE TRANSPORT PROTEIN) | 


| Arabidopsls thaliana cultlvar Columbia RPP1 3 (RPP1 3) gene, complete cds j 


| S.cerevlsiae chromosome Ii reading frame ORF YBR1 72c [ 


| EST3 69264 MAGE resequences, MAGD Homo sapiens cDNA | 


|EST369264 MAGE resequences, MAGD Homo sapiens cDNA j 
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j Oryctolagus cunlculus Ig H-chain pseudogene, V-region (VH6-a2) gene, partial cds j 


Top Hit Descriptor 


I6018Q8804F1 NIH_MGC_17 Homo sapiens cDNA clone IMAGE:411151; 


Homo sapiens promyelocytic leukemia zinc finger protein (PLZF) gene, coi 
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Rabbit beta-like globin gene cluster encoding the epsilon, gamma, delta (p: 
polypeptides, complete cds 


[HYPOTHETICAL 81.7 KD PROTEIN C13G7.04C IN CHROMOSOME 1 1 


|602081972F1 NIHJv1GC_81 Homo sapiens cDNA clone IMAGE:424650£ 


CYTADHERENCE HIGH MOLECULAR WEIGHT PROTEIN 3 (CYTAD 1- 
PROTEIN 3) (ACCESSORY ADHESIN PROTEIN 3) (P69) 


[Homo sapiens interleukin 12 p40 subunit (IL12B) gene, IL12B-1 allele, cor 


ws25b06.x1 NCI_CGAP_GG3 Homo sapiens cDNA clone IMAGE:24981* 
repetitive element;contains element PTR7 repetitive element ; 
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Fugu rubripes gamma-amlnobutyric acid receptor beta subunit gene, partis 
protein (P55), synaptic vesicle-associated integral membrane protein (VAfv 
enhancer protein (PCOLCE) gene3, complete c> 


JAV718037 FHTA Homo sapiens cDNA clone FHTAABH01 5" 


[Human mRNA forKIAA0361 gene, KIAA0361 protein 


|Homo sapiens partial LM01 gene for LIM domain only 1 protein, exon 1 


| Rat ISO-atrial natriuretic factor gene, complete cds 


|Rattus norveglcus repeat; map NOS-D12Wox1 


H.sapiens gene fragment for acetylcholine receptor (AChR) alpha subunit 
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Deinococcus radiodurans R1 section 1 52 of 229 of the complete chromosc 
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Top Hit Descriptor 
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jyu04f07.s1 Scares fetal liver spleen 1NFLS Homo sapiens cDNA clone IMAGE:232837 3* I 


Homo eapiens potassium voltage-gated channel, subfamily H (eag-re!ated), member 4 (KCNH4), mRNA 


|nq90b10.s1 NCLCGAP_Co9 Homo sapiens cDNA clone 1MAGE:1 159579 3' | 


| Beta vulgaris mitochondrion, complete genome | 


Thermotoga maritima section 105 of 136 of the complete genome | 


j IMMEDIATE-EARLY PROTEIN IE1 80 I 


| IMMEDIATE-EARLY PROTEIN IE180 | 


| Homo sapiens mRNA for KIAA1 21 5 protein, partial cds ! 


| Homo sapiens pshsp47 gene, complete cds | 


|ALPHA-2A ADRENERGIC RECEPTOR (ALPHA-2A ADRENOCEPTOR) (ALPHA-2AAR) [ 


|Helicobacter pylori, strain J99 section 87 of 132 of the complete genome j 


[602152OO1F1 NIH_MGC_81 Homo sapiens cDNA clone 1MAGE:4293001 5' | 


| Doto fragilis mitochondrial 16S rRNA gene, partial j 


|Human olfactory receptor (OR1 7-2) gene, partial cds I 


| VOLTAGE-GATED POTASSIUM CHANNEL PROTEIN KV3.3 (KSHIIID) . | 


| VOLTAGE-GATED POTASSIUM CHANNEL PROTEIN KV3.3 (KSHIIID) ] 


| Archaeoglobus futgldus section 1 35 of 1 72 of the complete genome j 


| Canis familiaris keratin (KRT9) gene, complete cds J 


Glycine max malate dehydrogenase (Mdh-2) gene, nuclear gene encoding mitochondrial protein, partial cds 


Glycine max malate dehydrogenase (Mdh-2) gene, nuclear gene encoding mitochondrial protein, partial cds 


|yd83b01.rl Soares fetal iiver spleen 1NFLS Homo sapiens cDNA clone IMAGE:1 1 4793 5" j 


|Mus musculus erythrocyte protein band 4.1 -like 3 (Epb4.1l3), mRNA j 


Haemophilus Influenzae hmcD, putative haemocin processing protein (hmcC), putative ABC transporter 
(hmcB), putative haemocin structural protein (hmcA), and haemocin immunity protein (hmcl) genes, complete 
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|DKFZp434H0614j1 434 (synonym: htes3) Homo sapiens cDNA clone DKFZp434H0614 5' j 


|DKFZp434H0614_r1 434 (synonym: htes3) Homo sapiens cDNA clone DKFZp434H0614 5' ] 


| S.cerevislae chromosome II reading frame ORF YBL025w j 


|yy11e10.M Soares melanocyte 2NbHM Homo sapiens cDNA clone IMAGE:270954 5' j 


|yy11e10.r1 Soares melanocyte 2NbHM Homo sapiens cDNA clone IMAGE:270954 5' J 


Top Hit 
Database 
Source 


EST.HUMAN | 


NT 


|EST_HUMAN | 


NT ' | 


I NT I 


jSWISSPROT | 


jSWISSPROT | 


INT | 


INT I 


1 SWISSPROT 1 


INT I 


|EST_HUMAN ! 


INT I 


INT I 


jSWISSPROT j 


[SWISSPROT i 


|NT 




NT ■ 


NT 


lESTJHUMAN 


|NT 


NT 


lESTJ-IUMAN 


|EST_HUMAN 


|NT 


lESTJHUMAN 


|EST_HUMAN 


TopHltAcesslon 
No. 


|H73968.1 [ 


6912445 


AA63948Z1 | 


9838361 | 


1AE001793.1 | 


JP11675 | 


jP11675 | 


|AB033041.1 | 


IAB010273.1 I 


|Q01338 j 


IAE001526.1 | 


IBF672695.1 [ 


oi 
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CD 
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O 


IQ01956 | 


|Q01956 i 


IAE000972.1 i 


|AF000949.1 | 


AF068687.1 


AF068687.1 


|T87354.1 


| 7305030 


U68399.1 


l 


|AL040537.1 


|Z35786.1 


1 


|N42536.1 


Most Similar 
(Top) Hit 
BLAST E 
1 Value 


2.1E-01 1 


3 

oi 


2.1E-01 1 


o 
LU 
csi 


1 


2.1E-01 1 


1 

CN 


2.1 E-0 1 1 


2.1E-01 


2.1E-01 1 


I 2.1E-01I 


I 2.1E-01 


! 2.1E-01 [ 


| 2.1E-01 1 


| 2.1E-01| 


I 2.1E-01I 


oi 


I 2.1E-01 


2.1E-01 


2.1E-01 


\ 2.1E-01 


| 2.1E-01 
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s 

oi 


3 

oi 
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Expression 
Signal 


2.1 9 1 


CM 
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1.161 


1.48| 


1.481 


I 1.62| 


I 1.751 




0.92] 


I 5.48| 


1.151 


I 1.92| 


0.78| 


! 0.78| 


I 2.34| 


177 


1.08 


1.08 


I 0.51 1 


! 1.04[ 


5.05 


! 0.84I 


i 0.84 


I 6.08 


I 0.59 


I 0.59 


ORF SEQ 
ID NO: 


285171 


28914 


29428 






300131 


300141 




i 30520! 


30994J 


310991 


31352J 


334031 


33320! 


33959 1 


33960 1 




34303 1 


34359 


34360 






35234 


I 35540 


| 35541 


I .35787 


1 36251 


j 36252 


Exon 
SEQ ID 
NO: 


158951 


15994 


I 16507 


189891. | 


1 170831 


17118 


1 171181 


I 17431 I 


176281 


18120 


| 18225| 


| 18479 1 


1 20093| 


I 20017| 


I 20596! 


| 20596I 


| 20608 1 


| 209121 


20994 


20964 


| 210221 


| 21376! 


21814 


22115! 


I 22115 


I 22357! 


| 22798; 


CO 


Probe 
SEQ ID 
NO: 


1 24891 


I 2936 


3461 


1 38191 


1 40451 


4084 


1 40841 


| 4403 I 


1 46071 


[ 5110 


I 52161 


I 5374i 


1 7071 1 


I 7083 | 


I 7636 I 


I 7636I 


| 7848 


1 7973 


l 8027 


8027 


1 8086 


I 8407 
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9149 
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Top Hit Descriptor 


601895465F1 NIH__MGC_19 Homo sapiens cDNA clone IMAGE:41 24824 5' | 


zd94a04.r1 Soares Jeta"Jieart L _NbHH19W Homo sapiens cDNA clone IMAGE:357102 5' similar to contains 
element KER repetitive element ; 


M.vannielii genes rpoH, rpoB and rpoA i 


Homo sapiens PHEX gene | 


|Homo sapiens PHEX gene J 


Drosophlla melanogaster signal transducUng adaptor protein (STAM), serine threonine kinase la! (IAL), and 
jzlnc finger protein (DN21 ) genes, complete cds 


C.perfringens ORF for putative membrane transport protein j 


Macromitrium levatum small ribosomal protein 4 (rps4) gene, chloroplast gene encoding chloroplast protein, 
partial cds 


Jdf29h08./I Morton Fetal Cochlea Homo sapiens cDNA clone IMAGE:2485094 5' j 


|df29h08.y1 Morton Fetal Cochlea Homo sapiens cDNA clone IMAGE :2485094 5' j 


< 
z 

D 
t> 
v> 
c 

O) 

I 

i 
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I 

to 

n 

v> 

co 
? 

CO 
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1 
1 

1 

s 


|MR3-ST0218-211299-013-a08 ST0218 Homo sapiens cDNA I 


Iyd47d03.r1 Soares fetal liver spleen 1 NFLS Homo sapiens cDNA clone IMAGE:1 1 1365 5' j 

Baalim. *i ihHllc f-nmnfata rrannma ^ecyfinn 1inf 91 V frnm 0*00451 tn 0fl1 OA7A I 


|yj70c05.r1 Soares breast 2NbHBst Homo sapiens cDNA clone IMAGE:1 54088 5' 


INTEGRIN ALPHA-5 PRECURSOR (FIBRONECTIN RECEPTOR ALPHA SUBUNIT) (INTEGRIN ALPHA- 
F) (VLA-5)(CD49E) 


(Arabidopsis thaliana DNA chromosome 4, conHg fragment No. 8 ] 


Borrelia burgdorferi glyceraidehyde-3-phosphate dehydrogenase (GAPDH), phosphoglycerate kinase (PGK), 
triosephosphato isomerase (TP!) genes, complete cds 


| M.musculus p1 6K gene for 1 6 kDa protein I 


>- 

8 

<D 
§ 
XI 

1 

a 
a 

c 

1 
Dl 


| Rattus norvegicus desmin (Des), mRNA | 


|60131 5638F1 NIH_MGC_8 Homo sapiens cDNA clone IMAGE:3634329 5* I 


|Synechocystis sp. PCC6803 complete genome, 23/27, 2868767-3002955 1 


| TYROSINE-PROTEIN KINASE TRANSFORMING PROTEIN ABL j 


[Mus musculus mRNA for prolidase, complete cds j 


|MRO-HT0208-221299-204-c08 HT0208 Homo sapiens cDNA ] 


|Homo sapiens G protein-coupled receptor 50 (GPR50) mRNA j 


| Homo sapiens G protein-coupled receptor 50 (GPR50) mRNA j 


Top Hit 
Database 
Source 


lESTHUMAN | 


EST HUMAN 


NT | 


NT ! 


| NT | 


NT 


INT | 


NT 


|est_human I 


|EST_HUMAN | 


|EST_HUMAN | 


|EST_HUMAN | 


EST_HUMAN I 


|est_human ] 


SWISSPROT 


| NT 


NT 


| NT 


|NT 
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| EST_HUMAN | 


1N| 


| SWISSPROT 


| NT 


|EST_HUMAN 


|NT 


|NT 


Top Hit Acession 
Na 


BF310959.1 [ 


CD 


X73293.1 I 


Y1 01 96.1 j 


Y10196.1 | 


AF1 21361.1 


0 
0 


AF023813.1 


AW021908.1 


IAW021908.1 


IBF375285.1 


|BF375286.1 


T84293.1 


|R53400.1 , 


P08648 


|AL1 61496.2 


U28760.1 


|X52102.1 


1X74773.1 


I 11968117 


| BE51 3802.1 


ID64004.1 


|P10447 


|D82983.1 


|AW377G98.1 


1 4758467 


| 4758467 


Most Similar 
(Top) Hit 
BLAST E 
Value 


1.4E-01| 


1.4E-01 


1.4E-01I 


1.4E-01I 


1 


1.4E-01 


! 1.4E-01I 


1.4E-01 
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Expression 
Signal 


9.19| 


1.19 


0.43 1 


1.44| 


1.44| 


2.06 


0.55| 


CM 


0.57| 


0.57I 


0.67J 


0.67I 


0.56I 


! 2.59 


2.53 


I 1,59, 


2.38 


| 1.55 


| 2.33 


2.24- 


\ 2.35 


! 2.29 


j 4.86 


i 3.72 


! 2.63 


| 2.48 


! 2.48 


ORFSEQ 
ID NO: 


35953) 


36024 


36109] 


36123| 


I 36124| 


34532 


[ 36567I 


36747 


i 36861| 


36862] 


1 370391 


37040] 




| 37680| 


37916 




37474 




( 31776! 














I 26338 


| 26339 


Exon 
SEQ ID 

NO: 


22504] 


22674 


22655I 


22666 


22666| 


21128 


23089 


23270 


23370] 


23370! 


235411 


23541 


. 23750 1 


' 24147) 


24376 


s 

T 
04 


23952 


I 24737I 


I 25272 


I 25280 


1 25964 


I 25362 


| 25981 


I 25792 


| 25601 


I 13414 


| 13414 


loo 


I 95411 


9630 


| 9702I 


971 3j 


! 971 3| 


9805 


10164 


10346 


1 104481 


I 104481 


§ 


10619 


I 108291 


| 11191] 


11432 


| 11752| 


I 11797l! 


11855 


| 125491 


( 125621 


1 126051 


| 127021 


| 12776! 


CO 
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Top Hit Descriptor 


ah31b06.s1 Soares_paratriyroid < _tumor_NbHPA Homo sapiens cDNA done 1240403 3' similar to gb:Ji 
CHROMOGRANIN A PRECURSOR (HUMAN); 


601470055F1 NIH_MGC_67 Homo sapiens cDNA clone IMAGE:3873229 5' 


MethanococcLs Jannaschii section 34 of 1 50 of the complete genome 


nh04g10.s1 NCI_CGAP_Thy1 Homo sapiens cDNA clone IMAGE:943362 


nh04g10.s1 NCI_CGAP_Thy1 Homo sapiens cDNA clone IMAGE:943362 


© 

CO 
D> 
lO 

» 

I 

X 


PM1-ST0270-080200-001-f09 ST0270 Homo sapiens cDNA 


DKFZp547P194jl 547 (synonym: hfbn ) Homo sapiens cDNA clone DKFZp547P194 5' 


Pediococcus acidilactici H plasmid pSMB74 pediocin AcH production (pap) gene cluster papA, papB, 
and papD genes, complete cds 


Wf48c01.x1 Srares__NFL_T_GBC_S1 Homo sapiens cDNA clone IMAGE2358816 3' similar to contaii 
repetitive element; 


Homo sapiens C16orf3 largo protein mRNA, complete cds 


zp93b1Zr1 Stratagene muscle 937209 Homo sapiens cDNA clone IMAGE:627743 5' 


zp93b12.r1 Stratagene, muscle 837209 Homo sapiens cDNA clone IMAGE:627743 5* 


P.furiosus partial dphS gene and ergF gene 


yd19h03.s1 Soares fetal liver spleen 1NFLS Homo sapiens cDNA clone IMAGE:108725 3" similar to 
gb:M81181 SODIUM/POTASSIUM-TRANSPORTING ATPASE BETA-2 (HUMAN); 


601436972F1 NIH_MGC__72 Homo sapiens cDNA clone IMAGE:3922048 5* 


CM3-HT0142-271099-026-g11 HT0142 Homo sapiens cDNA 


MR2-GN0027-040900-005-a08 GNQ027 Homo sapiens cDNA 


601140231F1 NIH_MGC_9 Homo sapiens cDNA clone IMAGE.-3049543 5* 


yj9Ba09.s1 Soares placenta Nb2HP Homo sapiens cDNA clone IMAGE;1 47064 3' 


Ceratltis capitata yoyo retrotransposon gag-like, pol-Jike and env-like genes, complete cds 


HSC1RF022 normalized infant brain cDNA Homo sapiens cDNA clone c-1rf02 3' 


Carasslus auratus actMn beta A precursor, mRNA, complete cds 


yh35f12.M Soares placenta Nb2HP Homo sapiens cDNA clone IMAGED 31759 5' similar to contains / 
repetitive element;contains TAR1 repetitive element ; 


RattUB norveglcus Phosphofructoklnase, liver, B-type (Pfkl), mRNA 


Z.mobllls tgt and lig genes encoding tRNA guanine transglycosylase and DNA Iigase 


Z.mobilis tgt and lig genes encoding tRNA guanine transglycosylase and DNA Iigase 


SKIN SECRETORY PROTEIN XP2 PRECURSOR (APEG PROTEIN) 
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ORF SEQ 
ID NO: 


34294 


344481 


34682| 


34936] 


34937J 


34982] 
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35562 


35660 


35756! 


35794! 


35795 1 
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SEQ ID 

NO: 
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22229 


223271 


22363 I 
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Top Hit Descriptor 


Human gene for dihydrollpoamld© succlnyltransferase, complete cds (exon 1-15) J 


PM3-BT0347-1 70200-001 -b08 BT0347 Homo sapiens cDN A | 


Synechocy3tis sp. PCC6803 complete genome, 1 7/27, 21 37259-2267259 j 


Synechocystis sp. PCC6803 complete genome, 17/27. 2137259-2267259 J 


601855548F1 NIH_MGC_57 Homo sapiens cDNA clone IMAGE:4075619 5' | 


Dictyoselium discoideum cyclic nucleotide phosphodiesterase gene, complete cds |! 


Thermoplasma acidophilum complete genome; segment 5/5 j 


EST378191 MAGE resequences, MAGi Homo sapiens cDNA [ 
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Arabidopsis thaliana putative transcription factor (HUA2) mRNA, complete cds { 


M.musculus gene for gelatJnase B | 


EST363209 MAGE resequences, MAGA Homo sapiens cDNA | 


Homo sapiens ABCA1 (ABCA1 ) gene, complete cds j 


Homo sapiens ABCA1 (ABCA1 ) gene, complete cds j 


Botrytis cinerea strain T4 cDNA library under conditions of nitrogen deprivation j 


H.saptens AGT gene, intron 4 | 


H.saptens AGT gene, intron 4 | 


Homo sapiens chromosome 21 segment HS21 COOB I 


Homo sapiens SCG10 liko-proteln, helicase-like protein NHL, M68, and ADP-ribosylation factor related 
protein 1 (ARFRP1) genes, complete cds 


DrosopNIa orena hunchback region j 


Homo sapiens cAMP responsive element binding protein-like 2 (CREBL2) mRNA | 


600943191F1 NIH_MGC_15 Homo sapiens cDNA clone IMAGE:2959510 5' j 


ar98c08j<1 Barstead colon HPLRB7 Homo sapiens cDNA clone !MAGE:21 73646 3' similar to gb:Z26876 
60S RIBOSOMAL PROTEIN L38 (HUMAN); 


Mus musculus colony stimulating factor 1 receptor (Csfir), mRNA | 


Mus musculus colony stimulating factor 1 receptor (Csfir), mRNA j 
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Arabidopsis thaliana RXW24L mRNA, partial cds 1 


RC3-GN0042-310800-024-d1 1 GN0042 Homo sapiens cDNA I 


ou63b05.s1 NCI_CGAP_Br2 Homo sapiens cDNA clone IMAGE:1632465 3' similar to WP:C37A2.2 
CE08611 ; 


Top Hit 
Database 
Source 


NT 


ESTJHUMAN j 


NT I 


NT | 


EST_HUMAN | 


NT | 


NT | 


ESTJHUMAN | 


ESTJHUMAN | 


NT I 


NT | 


ESTJHUMAN | 


NT | 


NT | 


NT | 


NT | 


NT | 


NT | 


NT 


NT | 


NT I 


ESTJHUMAN | 


EST HUMAN 


NT | 


NT | 


ESTJHUMAN | 


NT | 


EST_HUMAN I 


ESTJHUMAN 


Top Hit Acession 
No. 


D26535.1 


BE067219.1 | 


D90915.1 j 


D90915.1 ! 


BF246744.1 j 


CM 


AL445067.1 


CO 

CO 
CD 

< 


CN 

i 
< 


AF1 16550.1 | 


X72794.1 ! 


AW951 139.1 j 


AF275948.1 


AF275948.1 


AL1 14993.1 


X74208.1 | 


X74208.1 i 


AL1 63209.2 i 


AF21 7796.1 


AJ005375.1 


4503034| 


BE250008.1 i 


AI582029.1 


66810441 


6681044| 


BF348454.1 ! 


AB008019.1 


BF368016.1 ! 


AI081 644.1 


Most Similar 
(Top) Hit 
BLAST E 
Value 


J Ui 

> o 
5 CO 


8.0E-02| 


8.0E-02| 


80E-02| 


8.0E-02| 


8.0E-02| 


8.0E-02I 


8.OE-02I 


8.0E-021 


8.0E-02| 


CM 

9 

LU 

O 
CO 


8.0E-02| 


8.0E-02| 


8.0E-02| 


8.0E-02| 


evi 

3 

0 

CO 


8.0E-02! 


8.0E-02) 


8.0E-02 


8.0E-02| 


8.0E-02| 


7.9E-02| 


7.9E-02 


7.9E-02! 


7.9E-02| 


n 

UJ 

a» 


7.9E-02I 


7.9E-02) 


7.9E-02 


Expression 
Signal 


13.63 


*** 


1.05| 


1.05[ 


4.69| 


0.99| 


CD 

© 


6.64[ 




s 

6 


7.57) 


0.71 1 


3.28) 


1.44| 


8.74| 


1.21 1 


1.21 1 


CD 
O 


" 2.19 


6.54j 


2.061 


to 
«o 
■* 


co 


5.681 


5.68 1 


1.08| 


o> 


1.061 


4.89 


ORFSEQ 
ID NO: 


27723 j 


27939 | 


28417| 


28418| 




27088] 


28892| 


29772 1 


30724i 


30733 1 




321 05| 


32274| 


32274 1 


34851 | 


36139| 


36140| 




37632 


31 798 1 




2821 9| 


28971 


29808| 


29809] 


30633 1 




c 


36788 


Exon 
SEQ ID 
NO: 


15875 


14943, 


15392| 


15392! 


15482! 


14137! 


1 

10 


16870 1 


17827) 


17835| 


17869| 


18922| 


19077| 


19077| 


21434| 


22685| 


22685| 


23441 | 


24105 


25230 1 


18342) 


66 WV '■ 


16050 


8 
8 


16903] 


17742| 


17866| 


80661- 1 


23310 


Probe 
SEQ ID 
NO: 




| 1919| 


! 2384] 


| 2384I 


| 2478| 


I 2831| 


i 2911 1 


I 3830| 


| 4810| 


CO 


4852| 


5832| 


5993 1 


7386| 


8465 [ 


9744| 


9744 1 


| 10519| 


11145 


12483| 


j 13036| 


2184| 


CM 

8 


| 3864 1 


| 3864! 


( 47221 


i 


m ct 
to CI 

§ s 


10388 



130/546 



WO 01/57276 



PCT/US01/00668 



2 

CD 



-8 
5 



l g ( 

ili 



2 13 

II 



O CT 

C CL 

S XT 
2 £ 

Pi 



11 
|| 

£ 1 

N E 



5. Q. 

to ~~ 



i 



o 

CL 



"5. 



c < 

> i> O : 



g w 

! £ 9 



s ° 

o lO 
L CO - 



H 



i 

O <f 

O co 

I 

« o 



2 

CL 



°1 
J" 3 



3 S 



I 
8 

3* 



1 



131 



o 



isi 



131/546 



WO 01/57276 



PCT/US01/00668 



x 



r 2 12 



1 



z 


z 


tc 
E 


£ 


? 


? 






_j 




o> 


o> 

X) 


XI 

E 


I 


I 


E 


0 

f 


o> 


1" 


£ 


0 
a. 
« 
e 


nspor 


itter tra 


itter tra 


nsm 




1 rotra 


1 rotra 


<D 


2, 


e> 
>< 


<o 
_>» 


1 
CO 


1 

«2 


rieri 


<5 


can 


! 


3 


J3 


0 

to 


i 


liens 


(9 
S 


s 


Q. 

a 


OLUC 


O 
E 
0 


X 


X 



< ~ 
o z 

si 
il 

<M O 



O « 



<8 § 



2 E 

e! 
I J 



■1 

7> 



: s 

! a 

; & 



5 



1 

& 




is 



o 



las 



CD O) 

S 5 



132/546 



WO 01/57276 



PCT/USO 1/00668 



3£ I 
U £L 1 



'12 



S 

■h 

:ii 



§ i 

O J 

- I ! 



Is- 



1 3 



"§> 



J33/546 



WO 01/57276 



PCT/USO 1/00668 



Top Hit Descriptor 


Methanobacterium thermoautotrophlcum from bases 1029165 to 1039934 (section 88 of 148) of the oomplete 
genome 


Homo sapiens chromosome 21 segment HS 21 C1 01 | 


Homo sapiens chromosome 21 segment HS21C101 j 


Human immunodeficiency virus type 1 isolate 26 reverse transcriptase (pol) gene, internal fragment, partial 
cds 
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602077757F1 NIH_MGC_62 Homo sapiens cDNA clone IMAGE:4251950 5' | 


Methanccoccus jannaschii section 73 of 1 50 of the complete genome | 
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601883558F1 NiH_MGC_57 Homo sapiens cDNA clone IMAGE:4095710 5' j 


Streptococcus pneumoniae putative response regulator (zmpR), putative histidine kinase (zmpS), and putative 
zinc metalloprotease (zmpB) genes, oomplete cds 


Strongylocentrotus purpuratus mitochondrion, complete genome 


PROLINE-RICH PROTEIN MP-3 j 


PROLINE-RICH PROTEIN MP-3 | 


Lactococcus lactis cspE gene | 


Human gene for sex honnone-binding globulin (SHBG) J 


AV71 2462 DCA Homo sapiens cDNA clone DCAAUG01 5* j 


Homo sapiens plasma membrane calcium ATPase Isoform 1 (ATP2B1 ) gene, alternative splice products, 
partial cds 


601763523F1 N[H_MGC_20 Homo sapiens cDNA clone IMAGE:4026436 5' | 


hq24f!1.x1 NCI_CGAP_Adr1 Homo sapiens cDNA clone IMAGE:3120333 3' similar to TR:Q9Z340 Q9Z340 
ATYPICAL PKC SPECIFIC BINDING PROTEIN. ; 


oa62c07.s1 NCI_CGAP_GCB1 Homo sapiens cDNA clone IMAGE:1316844 3' j 


Homo sapiens zinc finger protein 92 (ZFP92), expressed-Xq28STS protein (XQ280RF), and blglycan (BGN) 
genes, complete cds; and plasma membrane calcium ATPase isoform 3 (PMCA3) gene, partial cds 


601343926F1 NIH_MGC_53 Homo sapiens cDNA clone IMAGE:3685951 5' | 


601065194F1 N!H_MGC_10 Homo sapiens cDNA clone IMAGE:3451559 5' f 


Rattus norvegicus bHLH transcription factor Mistl (Mistl) gene, complete cds ! 
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AJ230796 Homo sapiens library (Seranski P) Homo sapiens cDNA clone PS13D5 3' | 
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Mus musculus second IL1 1 receptor alpha chain (IL1 1 Ra2) gene, exons 1 and 2 I 


Citrobacter freundii DSM 30040 cyclopropane fatty acid synthase (cfa)gene, partial cds, dihydroxyacetone 
kinase (dhaK). glycerol dehydrogenase (dhaD), transcriptional activator (dhaR), 1 ,3-propanediol 
dehydrogenase (dhaT), glycerol dehydratase (dhaB),> 


Homo sapiens hypothetical protein SIRP-b2 (SIRP-b2), mRNA J 


Oryza sativa rbbi3-1 gene for putative Bowman Birk trypsin inhibitor i 


RC5-BT0559-1 40200-01 2-C03 BT0559 Homo sapiens cDN A j 


Hirudo medicinalis SNAP-25 homolog mRNA, complete cds | 


Bacillus subtilis complete genome (section 13 of 21): from 2395261 to 2613730 j 


Homo sapiens TESTIN 2 and TESTIN 3 genes, complete cds. alternatively spliced | 


Neurospora crassa ubiquinol-cytochrome c oxidoreductase subunit VIII (QCR8) mRNA, complete cds 


QV0-ST021 3-021 299-062-a09 ST021 3 Homo sapiens cDN A I 


QV0-ST021 3-021 299-062-a09 ST0213 Homo sapiens cDNA J 


ye37f12.r1 Stratagene lung (#93721 0) Homo sapiens cDNA clone IMAOE:1 19951 5' similar to gb:K01 506 
HLA CLASS II HISTOCOMPATIBILITY ANTIGEN, DP(1) ALPHA CHAIN (HUMAN); 
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Drosophlla melanogaster laminin B2 gene, complete cds | 


Drosophlla melanogaster laminin B2 gene, complete cds 1 


Pseudomonas putida ttgS gene \ 


Arabldopsis thaliana el!5 gene, exons 1-11 I 


Mus musculus caudal type homeobox-1 (Cdx-1 ) gene, complete cds j 


Helicobacter pylori 26695 section 5 of 1 34 of the complete genome I 


Helicobacter pylori 26695 section 5 of 1 34 of the complete genome I 


Human heparan sulfate proteoglycan (HSPG2) mRNA, complete cds I 


Lymphocystis disease virus 1 , complete genome I 


Haemophilus influenzae Rd section 1 47 of 1 63 of the complete genome j 


nuclear protein TIF1 isoform [mice, mRNA, 4053 nt] j 


HYPOTHETICAL 130.0 KD PROTEIN IN SNF6-SP011 INTERGENIC REGION j 


Mus musculus 129/Sv cystatin C (cst3) gene, complete cds ] 


Podospora anserina mitochondrial epsilon-sen DNA j 
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Homo capiens hCMT! b mRNA for mRNA (guanine-7-)methyitransferase, complete cds | 
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Top Hit Descriptor 


EST28167 Cerebellum II Homo sapiens cDNA 5' end similar to similar to neuro-D4 protein j 


A.europaeum mRNA for legumin-like protein 


Gallus gallus mRNA for alphal integrin, complete cds i 


.Homo sapiens ret finger protein-like 3 (RFPL3), mRNA j 


|zq43f1 1 .r1 Stratagene hNT neuron (#937233) Homo sapiens cDNA clone IMAGE:632493 5' | 


|601 6521 54F1 NIH_MGC_82 Hcmo sapiens cDNA clone IMAGE:3935388 5* | 


HYPOTHETICAL PROTEIN (ORF 2280) [ 


QV2-PT0012-010300-070-g02 PT0012 Homo sapiens cDNA j 


|Myxococcus xanthus serine/threonine kinase Pkn10 (pknIQ) gene, complete cds j 


Homo sapiens S164 gene, partial cds; PS1 and hypothetical protein genes, complete cds; and S171 gene, 
partial cds 


Homo sapiens S164 gene, partial cds; PS1 and hypothetical protein genes, complete cds; and S171 gene, 
partial cds 


j Ovls aries CCAAT-enhancer binding protein ep3ilon gene | 


| Canls familiaris matrix metailoproteinase 9 (MMP-9) mRNA, partial cds ( 


| Canis familiaris matrix metailoproteinase 9 (MMP-9) mRNA, partial cds I 


|nw13h03.s1 NCI_CGAP_SS1 Homo sapiens cDNA clone IMAG&1239221 3* j 


Hepatitis E virus strain HEV-US2 polyprotein (ORF1), (ORF3), and capsid protein (ORF2) genes, complete 
cds 


Iae33f04.r1 Gessler Wilms tumor Homo sapiens cDNA done IMAGE:897631 5* I 


|601878746F1 NIH_MGC_55 Homo sapiens cDNA clone IMAGE:4107418 5' | 


jMorone saxatiGs myosin heavy chain FM3A (FM3A) mRNA, complete cds j 


| AV704878 ADB Homo sapiens cDNA clone ADBAOH08 5' \ 


| Homo sapiens chromosome 21 segment HS21 C0 10 j 


(Homo sapiens promyelocyte leukemia zinc finger protein (PLZF) gene, complete cds | 


| PLECTIN l 


| PLECTIN I 


Ins69c12.s1 NCI_CGAP_Pr2 Homo sapiens cDNA clone IMAGE: 1 188886 I 


[H.sapiens NCAM mRNA for neural cell adhesion molecule j 


| H.sapiens NCAM mRNA for neural cell adhesion molecule | 


|AU123327 NT2RM2 Hcmo sapiens cDNA clone NT2RM2000020 5' | 


|AU123327 NT2RM2 Homo sapiens cDNA clone NT2RM2000020 5' | 


Top Hit 
Database 
Source 


ESTJHUMAN I 


NT [ 


NT j 


NT ! 


ESTJHUMAN | 


jESTJHUMAN | 


ISWISSPROT | 


! ESTJHUMAN | 


INT j 


NT 


NT 


|NT I 


Int. 1 


INT | 


j ESTJHUMAN | 


IN 


I ESTJHUMAN 


[ESTJHUMAN 


|NT 1 


(ESTJHUMAN 


|NT 


|NT 


ISWISSPROT 


ISWISSPROT 


|EST_HUMAN 


1N| 


|NT 


|EST_HUMAN 


[ESTJHUMAN 


Top Hit Acession 
No. 


AA325216.1 | 


X95508.1 I 


AB000470.1 j 


11418013| 


AA191097.1 I 


BE972733.1 | 


P31568 I 


AW875475.1 | 


AF1 59160.1 j 


AF109907.1 


AF109907.1 


IAJ222689.1 I 


|AF095824.1 | 


1 

s 

0 

< 


AA736969.1 


AF060669.1 


IAA496739.1 


|BF241 245.1 


|AF003249.1 


CO 

9 

K 


|AL1 6321 0.2 


|AF060568.1 


(P30427 


jP30427 


IAA652266.1 


3 S 


1X55322.1 


|AU 123327.1 


|AU 123327.1 
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BLAST E 
Value 
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4.4E-02| 
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4.53 1 


0.43 1 
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2.11 1 
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| 0.94| 


0.59 1 
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| 8.93 1 


! 1.25[ 


I 4.62 


| 4.62 


I 0.73 


1 0.95 


j 0.95 


| 1.85 


! 2.58 


ORF SEQ 
ID NO; 


i 36715| 


! 36875| 


36891 | 


318231 


31440| 






28527| 


1 29612| 


30563 


30584 




33639 1 


336401 


35497| 


37886 


[ 38025! 




CO 


i 28595| 


0 




| 32977 


j 32978 


j 33245 


1 35548 


| 35547 


! 26832 




Exon 
SEQ ID 
NO: 


| 23233| 


| 23382I 


| 23499| 


1 25203| 


1 258631 


13322| 


151241 


15501| 


166971 


17678 


17678 


177911 


20298 1 


I 

CN 


22071 | 


24353 


24474I 


t O 

I 3 
M CM 


138411 


15576| 


164901 


| 16714| 


I 19702! 


| 19702! 


I 19948> 


1 22119 


0 


I 13881! 


| 139231 


Probe 
SEQ ID 

NO: 


| 10309| 


I 104601 


| 10577| 


! 12440| 


1 128231 




1 21071 


| 2498 | 


j 3654] 


4657 


! 4657I 


| 47711 


I 7325I 


! 73251 


|l 9105| 
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I 115331 


| 12346| 
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Top Hit Descriptor 


Fugu rubripes neural cell adhesion molecule Lt homolog (L1-CAM) gene, complete cds; putative protein 1 
(PUT1) gene, partial cds; mitosis-specific chromosome segregation protein SMC1 homolog (SMC1) gene, 
complete cds; and calcium channel alpha-1 subunit;* 


ADAM-TS 1 PRECURSOR (A DISINTEGRIN AND METALLOPROTEINASE WITH THROMBOSPONDIN 
MOTIFS 1) (ADAMTS-1) (ADAM-TS1) 


CUTICLE COLLAGEN 34 j 


EST84291 Colon adenocarcinoma IV Homo sapiens cDNA 5" end j 


Brasefca napus gin gene for plastid glutamine synthetase, exons 1-12 ] 


Wb98h01 .xl NCI_CGAP_Pr28 Homo sapiens cDNA clone IMAGE:2313745 3* | 


Homo sapiens mRNA for KIAA1 471 protein, partial eds j 


Homo sapiens cytochrome P450 polypeptide 43 (CYP3A43) gene, partial cds; cytochrome P450 polypeptide 
4 (CYP3A4) and cytochrome P450 polypeptide 7 (CYP3A7) genes, complete cds; and cytochrome P450 
polypeptide 5 (CYP3A5) gene, partial cds 


7n52h07jc1 NCl_CGAP_Lu24 Homo sapiens cDNA clone IMAGE:3568380 3* similar to TR:075296 075298 
R29124J.; 


StrongylocentrotUB purpuratus homolog of human bone morphogenetic protein 1 (submp) mRNA, complete 
cds 


Arabidopsls thallana DNA chromosome 4, contig fragment No. 35 \ 


Homo sapiens DNA for GPI-anchored molecule-like protein, complete cds | 


Homo sapiens DNA for GPI-anchored molecule-like protein, complete cds | 


GLUCOAMYLASE S1/S2 PRECURSOR (GLUCAN 1 ,4-ALPHA-GLUCOSIDASE) (1,4-ALPHA-D-GLUCAN 
GLUCOHYDROLASE) 


602153884F1 NIH_MGC_83 Homo sapiens cDNA clone IMAGE:4294724 5' | 


Methanobacterium thermoautotrophicum strain Marburg, Thiol:fumarate reductase subunit A I 


Human mRNA for KIAA0082 gene, partial cds j 
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Ovis aries mRNA for acetyl-coA carboxylase j 
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FAS ANTIGEN LIGAND | 


M.musculus DNA for desmin-binding fragment DesD7 | 


Homo sapiens succinate dehydrogenase complex, subunit C, integral membrane protein, 1 5kD (SDHC) 
mRNA 

RC6-ST0258-1 71 199-021 -C09 ST0258 Homo sapiens cDNA 


Top Hit 
Database 
Source 


NT 


SWISSPROT 


SWISSPROT | 


EST_HUMAN | 


NT 


EST_HUMAN | 


NT | 
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SWISSPROT [ 


NT I 


NT 

EST HUMAN 


Top Hit Acesslon 
No. 


AF026198.1 


P97857 


P34687 ! 


AA372398.1 I 


AJ271909.1 ! 


A1675392.1 j 


AB040804.1 I 


AF280107.1 


BF1 10434.1 


L23838.1 


AL1 61 535.2 j 


AB000381.1 


AB000381.1 ( 


P08640 


BF679376.1 


AJ000941.1 


D43949.1 i 


AJ001018.1 | 


AJ001056.1 [ 


BF516149.1 | 


P41047 ! 


AJ403386.1 | 


4506862 
AW392417.1 


Most Similar 
(Top) Hit 
BLAST E 
Value 
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ORFSEQ 
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35361 1 


359161 
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27654 I 
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31466 


32649 


34288 
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J 


SEQ ID 

NO: 
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25909 


14681 1 


16312| 
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19408 
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22954 1 


23268] 


24941| 


25730| 


14166! 


14383 


14999 


15709 
18200 
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Top Hit Descriptor 


za39a10.ri Soares fetal liver spleen 1NFLS Homo sapiens cDNA clone IMAGE:294906 5' similar to contains 
element TAR 1 repetitive element ; 


za39a10.r1 Soares fetal liver spleen 1NFLS Homo sapiens cDNA clone 1MAGE:294906 5' similar to contains 
'element TAR1 repetitive element ; 


Cyprinus carpio mRNA for inducible nitric oxide synthase (iNOS gene) 


601 512206F1 NIH_MGC_71 Homo sapiens cDNA clone IMAGE:3913848 5' j 


601512206F1 NIH_MGC_71 Homo sapiens cDNA clone IMAGE:3913848 5* ) 


Homo sapiens nuclear factor of kappa light polypeptide gene enhancer in B-celis 1 (NFKB1 ) gene, complete 
cds 


Homo sapiens nuclear factor of kappa light polypeptide gene enhancer in B-cells 1 (NFKB1) gene, complete 
cds 


Human dystrophin gene j 


601 854981 F1 NIH_MGC_57 Homo sapiens cDNA clone IMAGE:4074548 5' ( 


6021 54364F1 NIH_MGC_83 Homo sapiens cDNA clone IMAGE:4295654 6' ] 


1L5-HT0704-290600-108-C04 HT0704 Homo sapiens cDNA J 


Ornithorhynchus anatinus coagulation factor X mRNA, complete cds j 


Thermotoga mariti ma section 1 09 of 1 36 of the complete genome j 


Human coagulation factor VII (F7) gene exon 1 and factor X (F10) gene, exon 1 | 


ne87f04.s1 NCI_CGAP_Kid1 Homo sapiens cDNA clone 1MAGE:91 1 263 | 


Vh63d04.s1 Soares placenta Nb2HP Homo sapiens cDNA clone IMAGE:1 34407 3' J 


2 
o 

o 

s 

"1 

o 
E 
o 
I 

CO 
CO 

1 

z 

IO 

o 

-T 
CO 

s 

a 

I 

co 

Q 
O 

z 

5 

> 
a 


•a 
u 

3 

4) 
1 

8 

< 
z 

DC 
E 

<D 

n 

I 
1 

! 

§ 

5 

« 

TJ 

E 
2 

CD 

4 
8 

u 

3 
Dl 

OL 

Q 
Z) 

CD 

s 

I 

c 


Homo sapiens mitochondrial giutathione reductase and cytosdic glutathione reductase (GRD1) gene, 
complete cds, alternatively spliced 


601338428F1 NIH_MGC_53 Homo sapiens cDNA clone IMAGE:3680695 5' ! 


601338428F1 NIH_MGC_53 Homo sapiens cDNA clone IMAGE:3680695 5' j 


Sheep gene for ultra high-sulphur keratin protein | 


yu07e10.ii Soares fetal liver spleen 1NFLS Homo sapiens cDNA clone IMAGE:233130 5' j 
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601 452661 F1 NlH_MGC_66 Homo sapiens cDNA clone IMAGE:3856598 5' j 


Neisseria meningitidis DNA for region 2 (fhaB- and fhaC-homologs, unknown genes) and flanking genes, 
strain FAM18 


601 140729F1 NiH_MGC_9 Homo sapiens cDNA clone 1MAGE:3049830 5' ) 
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Most Similar 
: (Top) Hit 
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Top Hit Descriptor 


Mus musculus sema domain, transmembrane domain (TM), and cytoplasmic domain, (semaphorin) 6B 
!{Sema6b), mRNA 


| Arabldopsis thallana C2H2 anc finger protein FZF mRNA. complete cds ] 
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Homo sapiens chromosome 21 segment HS21C078 | 


Caenorhabditis elegans sma-2 mRNA, complete cds j 


Dictyoslelium discoideum class VII unconventional myosin (myol) gene, complete cd3 ) 


Pyrococcus horikoshii OT3 genomic DNA, 777001-994000 nt position (4/7) \ 


;pyrococcus horikoshii OT3 genomic DNA, 777001-994000 nt position (4/7) \ 


Japanese encephalitis virus envelope protein mRNA. partial cds \ 
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i Mycobacterium tuberculosis H37Rv complete genome; segment 93/1 62 j 
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jnf19aD7.s1 NCLCGAP_Pr1 Homo sapiens cDNA clone IMAGE:914196 similar to contains LU1 L1 
repetitive element ; 
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Homo sapiens chromosome 21 segment HS21 C1 03 I 


j Homo sapiens chromosome 21 segment HS21 C1 03 \ 
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qn04c07.x1 NCI_CGAP_Lu5 Homo sapiens cDNA clone IMAGEn 897260 3' similar to contains Alu repetitive 
element; 


(HOMEOTIC BICOID PROTEIN (PRD-4) | 


HOMEOTIC BICOID PROTEIN (PRD-4) | 
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Top Hit Descriptor 


tj46d04.x1 Soares_NSF_FB_9W_OT_PA_PjS1 Homo sapiens cDNA clone IMAGE:2144551 3' similar to 
contains AIu repetitive element; 


Arabidopsis thaliana DMA chromosome 4, contig fragment No. 50 ( 


Mus mueculus T cell receptor gamma locus, TOR gamma 1 and gamma 3 gene clusters j 


Meleagris gallopavo paraaxonase-2 (PON2) mRNA, complete cds | 


Drosophlla kanekci gene for glycerol-3-phosphate dehydrogenase, complete cds 


Homo eapiens interferon-gamma receptor alpha chain gene, exon 1 j 


Homo sapiens Interferon-gamma receptor alpha chain gene, exon 1 j 


Neisseria meningitidis serogroup A strain Z2491 complete genome; segment 3/7 | 


601 888130F1 NIH_MGC_19 Homo sapiens cDNA clone IMAGE:4125462 5' | 


Nicotiana tabacum type II phytochrome (phyB) gene, complete cds \ 


601852385F1 NIH_MGC_56 Homo sapiens cDNA clone IMAGE:4076253 5' [ 
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Synechocystis sp. PCC6803 complete genome, 20/27, 2539000-2644794 \ 


HIrudo medldnalls Intermediate filament gliartn mRNA, complete cds | 


H.sapiens MUC18 gene exon 16 J 


hn52c06.x1 NCI_CGAP_Co17 Homo sapiens cDNA clone IMAGE:3027274 3* similar to contains element 
MER29 repetitive element ; 


601 894329F1 N IH_MGC_1 7 Homo sapiens cDNA clone 1MAGE:41 39983 5' | 


H.francisci mRNA for myelin basic protein (MBP) j 


Pseudomonas aeruginosa PA01 , section 1 0 5 of 529 of the complete genome | 
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MR1-OT001 1-280300-009-g04 OT001 1 Homo sapiens cDNA I 


MR1 -OT001 1-280300-009-g04 OT001 1 Homo sapiens cDNA ] 


ak24h04.s1 SoaresJestisJMHT Homo sapiens cDNA clone IMAGE:1 406935 3' | 


QV4-DT0021 -301 299-071 -b11 DT0021 Homo sapiens cDNA | 


HYPOTHETICAL PROTEIN DJ845024.2 ] 


Oryza sativa putative histone deacetylase HD2 mRNA, complete cds jj 


Neisseria meningitidie eerogroup B strain MC58 section 1 60 of 206 of the complete genome j 


Neisseria meningitidis serogroup B strain MC58 section 160 of 206 of the complete genome j 


HYPOTHETICAL 7.9 KD PROTEIN IN FIXW 5'REGION 1 


601 763268F1 N IH_MGC_20 Homo sapiens cDNA clone IMAGE:4026280 5' j 


601 763268F1 N IH_MGC_20 Homo sapiens cDNA clone IMAGE:4026280 5' j 


Mus musculus carbonic anhydrase IV gene, complete cds ] 
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Top Hit Descriptor 


ti22c02.x1 NCI_CGAPKid11 Homo sapiens cDNA clone IMAGE;2131202 3' similar to SW:R13A_HUMAN 
P40429 60S RIBOSOMAL PROTEIN L13A ; 


Bacillus subtilis fenD gene ] 


Homo sapiens okadaic acid-inducible and cAMP-regulated phosphoprolein 19 (ARPP-19) mRNA, complete 
cds 


M.thermoformictcum complete plasmid pFV1 DNA | 


EST374237 MAGE resequences, MAGG Homo sapiens cDNA | 


Homo sapiens hypothetical zinc finger protein FLJ14011 (FU14011), mRNA ] 


Mus musculus zinc-finger protein mRNA, complete cds | 


601 572746F1 NIH_MGC_57 Homo sapiens cDNA clone IMAGE:3839747 5' j 


Rhodobacter capsulatus strain SB 1003, partial genome | 


602151024F1 NIH_MGC_81 Homo sapiens cDNA clone IMAGE:4292212 5* j 


Methanobacterium thermoautotrophlcum from bases 429192 to 450296 (section 39 of 148) of the oomplete 
genome 


Pneumocystis carinii f. sp. ratti guanine nucleotide binding protein alpha subunit (pcgl) gene, complete cds 


S YNAPTO IslEMAL COMPLEX PROTEIN 1 (SCP-1 PROTEIN) j 


601482621F1 NIH_MGC_68 Homo sapiens cDNA clone IMAGE:3885388 5' j 


Brassica napus slg gene for S-locus glycoprotein, cultivar T2 j 
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complete ORFA. and grpE- 


complete ORFA, and grpE- 


complete ORFA, and grpE- 


complete ORFA, and grpE- 


Arabidopsis thaliana mRNA for DEAD box RNA he!icase,RH3 J 


qd79d05.x1 Soares.testis.NHT Homo sapiens cDNA clone IMAGE: 1 735689 3' | 


Homo sapiens mRNA far KIAA1 180 protein, partial cds j 


601194796F1 NIH_MGC_7 Homo sapiens cDNA clone IMAGE:3538799 5" | 


yc81f09.s1 Soares infant brain 1NIB Homo sapiens cDNA clone IMAGE:22395 3' [ 


Arabidopsis thaliana DNA chromosome 4, contig fragment No. 3 | 


Chlamydia trachomatis partial ORFB; amlnoacyl-tRNA synthase, complete cds; 
like protein, complete cds 


Chlamydia trachomatie partial ORFB; aminoacyl-tRNA synthase, complete cds; 
like protein, complete cds 


Chlamydia trachomatis partial ORFB; amlnoacyl-tRNA synthase, complete cds; 
like protein, complete cds 


Chlamydia trachomatis partial ORFB; aminoacyl-tRNA synthase, complete cds; 
like protein, complete cds 


Top Hit 
Database 
Source 


EST.HUMAN 


NT | 


NT 


NT | 


EST.HUMAN | 


NT | 


NT | 


EST_HUMAN | 


NT I 


EST_HUMAN | 


NT 


NT 


SWISSPROT | 


ESTJHUMAN | 


NT I 


EST.HUMAN | 


NT 


NT 


NT 


NT 


NT | 


EST_HUMAN | 


NT I 


EST.HUMAN [ 


EST.HUMAN | 


NT I 


Top Hit Acession 
No. 


A1432681.1 


AJ011849.1 | 


AF084555.1 


X68366.1 | 


AW962164.1 I 


11545814| 


U1 4556.1 I 


BE737895.1 | 


AF01 0496.1 | 


BF671 185.1 


AE000833.1 


U30790.1 


Q62209 | 


BE788019.1 i 


AJ245480.1 | 


BF1 10298.1 | 


L25105.1 


L25105.1 


o 
1 


L25105.1 


A J01 0457.1 | 


AM 38977.1 | 


CD 

s 

co 
co 
o 
CO 

< 


BE266057.1 \ 


T87623.1 ! 


1 
5 


Most Similar 
(Top) Hit 
BLAST E 
Value 


6.0E-03 


6.0E-03| 


CO 

S 

o 

CO 


6.0E-03| 


co 

9 

UJ 
o 

CO 


6.0E-03) 


6.0E-03| 


6.0E-03| 


6.0E-03| 


6.0E-03| 


6.0E-03 


6.0E-03 


6.0E-03| 


6.0E-03| 


6.0E-03[ 


6.0E-03| 


5.0E-03 


5.0E-03 


5.0E-03 


CO 

S 

o 
in' 


5.0E-03| 


5.0E-03 


CO 

9 

LU 
o 
in 


5.0E-03| 


5.0E-03| 


5.0E-03| 


Expression 
Signal 


2.08 


0.87| 


1.03 


0.68| 


1.61 1 


1.55| 


3.99 1 


2.65 1 


2.28 1 


1.52| 


5.26 


£ 


1.48 1 


2.10) 


1.53| 




<<* 

CO 

oi 


2.34 


3.43 


3.43 


1.03 1 


1.021 


2.63 [ 


3.66| 


3.96| 


3.05 1 


ORFSEQ 
ID NO: 




co 

1 




37187| 


37580 | 






37805 1 


















1 


26662 


26661 


26662 






28703 


28930 [ 


29125! 




c 9 .. 

|2 UJ z 

CO 


m 

8 


23445| 


23581 


23690 1 


24056 | 


241 20 | 


24277 1 


24278| 


25123| 


2581 2 | 


25744 


25807 


25285 1 


25459j 


25471| 


25584 | 


13735 


13735 


13735 


13735 


14158] 


14607| 


15686j 


16005| 


16210| 


16224| 


Probe 
SEQ ID 
NO: 


10403 


| 10523 


10659 


| 10769 


| 11096| 


| 11162| 


| 11327| 


| 1 1 328 | 


| 12319| 


| 12422| 


12446 


12525 


| 12576 1 


| 12850| 


| 12669) 


[ 13043 1 


S 


© 

co 


£ 

CO 




I 1114 


| 1574 


j 2690 


! 2947' 


( 3153 


3169| 



179/546 



WO 01/57276 



PCT/US01/00668 



Top Hit Descriptor 


yj86g02.s1 Soares breast 2NbHBst Homo sapiens cDN A clone IMAGE:1 55666 3' ) 


iHomo sapiens partial LIMD1 gene for LIM domains containing protein 1 and KIAA0851 gene ] 


jHomo sapiens chromosome 21 segment HS21C085 j 


iPseudomonas aeruginosa strain PA01 penicillin-binding protein 1 B (ponB) gene, complete cds 


! Citrus sinensis seed storage protein citrin mRNA, complete cds | 


EST1221 8 Uterus tumor I Homo sapiens cDNA 5' end { 


yu70g10.r1 Soares fetal liver spleen 1NFLS Homo sapiens cDNA clone IMAGE 540066 5' | 


Citrus sinensis seed storage protein citrin mRNA, complete cds j 


Human putative chromatin structure regulator (SUPT6H) mRNA, complete cds j 


Homo sapiens SCL gene locus 


cn15c02.x1 Normal Human Trabecular Bone Cells Homo sapiens cDNA clone NHTBC_cn15c02 random 


|SPERM MITOCHONDRIAL CAPSULE SELENOPROTEIN (MCS) | 


|Mus musculus glucosamine-6-pho3phate deaminase (Gnpi), mRNA | 


|SODIUM CHANNEL PROTEIN PARA (PARALYTIC PROTEIN) | 


PROBABLE UBIQUITIN CARBOXYL-TERWINAL HYDROLASE FAF-Y (UBIQUITIN THIOLESTERASE 
FAF-Y) (UBIQUITIN-SPECIRC PROCESSING PROTEASE FAF-Y) (DEUBIQUITINATING ENZYME FAF- 
Y) (FAT FACETS PROTEIN RELATED, Y-LINKED)(UBIQUITIN-SPECIFIC PROTEASE 9, Y 
CHROMOSOME) 


| Chlamydophila pneumoniae AR39, section 62 of 94 of the complete genome j 


[600944564T1 NIH_MGC_1 7 Homo sapiens cDNA clone IMAGE:2960871 3' j 


| Mus musculus AMD1 gene for S-adenosylmethioninQ decarboxylase, complete cds j 


| Tursiops truncatus mRNA for p40-phox, complete cds \ 


|Mus musculus dynein, axon, heavy chain 11 (Dnahc11), mRNA ) 


EST0301 2 Fetal brain, Stratagene (cat#936206) Homo sapiens cDNA clone HFBCR93 similar to EST 
containing Alu repeat 


|RC3-CT0255-031 099-01 1-f 07 CT0255 Homo sapiens cONA I 


|Homo sapiens MASL1 mRNA, complete cds j 


ADAM-TS 5 PRECURSOR (A DISINTEGRIN AND METALLOPROTEINASE WITH THROMBOSPONDIN 
MOTIFS 5) (ADAMTS-5) (ADAM-TS5) (AGGRECANASE-2) (ADMP-2) (IMPLANTIN) 


ADAM-TS 5 PRECURSOR (A DISINTEGRIN AND METALLOPROTEINASE WITH THROMBOSPONDIN 
MOTIFS 5) (ADAMTS-5) (ADAM-TS5) (AGGRECANASE-2) (ADMP-2) (IMPLANTiN) 


[BETA-GALACTOSIDASE PRECURSOR (LACTASE) | 


| Mouse complement receptor (CR2) mRNA, 3' end 


Top Hit 
Database 
Source 


EST_HUMAN | 


NT i 


NT I 


NT ! 


NT i 


3 

% 


ESTHUMAN | 


NT 


NT | 


NT i 


EST HUMAN 


SWISSPROT | 


NT | 


ISWISSPROT i 


SWISSPROT 


INT I 


|EST_HUMAN | 


|NT I 


INT I 


INT I 


I 

h- 

V) 
U) 


|EST_HUMAN 


| NT 


SWISSPROT 


SWISSPROT 


ISWISSPROT 


| NT 


Top Hit Aoession 
No, 


R71794.1 | 


AJ297357.1 | 


AL1 63285.2 | 


AF1 47449.2 | 


U38914.1 | 


AA299675.1 | 


H78355.1 i 


U38914.1 | 


U46691.1 | 


AJ131016.1 | 


AI752367.1 


IP15265 j 


6754029) 


P35500 | 


000507 


IAE002234.2 


IBE300091.1 | 


)AB025024.1 | 


|AB038267.1 | 


s 
s 


T05124.1 


|AW854327.1 


| AB01 6816.1 


Q9R001 


Q9R001 


1 


IM61132.1 


Most Similar 
(Top) Hit 
BLAST E 
Value 


5.0E-03| 


5.0E-03| 


5.0E-03| 


5.0E-03| 


i 5.0E-03| 


5.0E-03| 


! 5.0E-03| 


i 5.0E-O3| 


| 5.0E-031 


i 5.0E-03| 


5.0E-O3 
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o 
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9 
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o 
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| 5.0E-O3! 


5.0E-03 


5.0E-03 
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I 5.0E-03 


Expression 
Signal 


1.22| 


0.84 1 


0.97| 


8 




ot 


8 

o 
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1.02| 


1.1*1 


1.34 


1.08 1 




5.691 


2.97 


j 0.89| 


| 7.44 


I 7.12 


i 0.82 


I 0.57 


8 
d 


I 1.17 


CO 

CD 


0.49 


0.49 


I 2.12 


! 5.83 


ORFSEQ 
ID NO: 


29153| 




29635I 


29670 | 


29722 [ 




30246 | 


29722 1 


30515) 


30546) 


30665 


30875| 


311051 


321 72| 


00 

e> 






) 312741 




I 33602, 


34047 




| 34378[ 


34431 


34432 


i 




J 


SEQ ID 

NO: 


162361 


16344| 


16722) 


1 16756| 


168131 


17035 1 


17361| 


16813| 


17622| 


17659] 


17769 


1 


18230| 


189811 


19237 


[ 19272| 


I 19801 I 


1 18355! 


in 


| 20267I 


20683 


| 20801 1 


| 20981 ! 


21033 


21033 


| 21549 


| 21925| 




SEQ ID 

NO: 


1 31811 


j 3291 1 


o 

1 


| 3713 | 


I 37711 


lO 


| 43331 


| 4335| 


I 4601 | 


| 4638 | 


4749 


1 


1 


1 


6162 


| 6198| 


I 6747I 


| 7023 1 


| 7240! 


I 7295 [ 


7727 


| 7858I 


I 8044 I 


8097 
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i 8581 | 
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Top Hit Descriptor 


EST1 1 1 91 Uterus Homo sapiens cDNA 5' end similar to EST containing 0 family repeat 1 


Homo sapiens cell cycle progression 3 protein (DNJ3) mRNA | 


Mus musculus G protein coupled receptor gene, complete cds; and unknown gene 1 


AU121712 MAMMA1 Homo sapiens cDNA clone MAMMA1 000798 5' j 


QV0-CT0387-1 80300-1 67-el 0 CT0387 Homo sapiens cDNA ! 


LINE-1 REVERSE TRANSCRIPTASE HOMOLOG 


MYOMESIN 2 (M-PROTEIN) (185 KD TITIN-ASSOCIATED PROTEIN) (165 KD CONNECTIN- 
ASSOCIATED PROTEIN) 


DKFZp434L2023ji 434 (synonym: htes3) Homo sapiens cDNA clone DKFZp434L2023 5' j 


DKF2p434L2023_r1 434 (synonym: htes3) Homo sapiens cDNA clone DKFZp434L2023 5 1 | 


Solanum lycopersicum phytochrome F (PHYF) gene, partial cds ! 


Solanum lycopersicum phytochrome F (PHYF) gene, partial cds j 


Homo sapiens DNA, DLEC1 to ORCTL4 gene region, section 1/2 (DLEC1 , ORCTL3, ORCTL4 genes, 
complete cds) 


Homo sapiens DNA, DLEC1 to ORCTL4 gene region, section 1/2 (DLEC1, ORCTL3, ORCTL4 genes, 
complete cds) 


Homo sapiens FRA3B common fragile region, diadenoslne triphosphate hydrolase (FHIT) gene, exon 9 


Human immunoglobulin C(mu) and C(delia) heavy chain genes (constant regions) I 


CO 
CO 

c> 

V 

c 

■8 

< 

2 
O 
o 
w 

5 
& 

W 

X 

S 

z 

w 

I 
vr 
8 

8 

V) 

"w 

CM 


GASTRULA ZINC FINGER PROTEIN XLCGF26.1 1 


RC3-HT0254-1 51099-01 1-b05 HT0254 Homo sapiens cDNA j 


zu68c1 1.r1 SoaresJestls_NHT Homo sapiens cDNA clone IMAGE:742964 5' | 


AV730373 HTF Homo sapiens cDNA clone HTFAAA0 1 5' j 


Homo sapiens partial 5-HT4 receptor gene, exons 2 to 5 |i 


1 
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yx26c09.s1 Soares melanocyte 2NbHM Homo sapiens cDNA clone IMAGE:262864 3' similar to contains 
Ll.tl L1 repetitive element ; 


| PERICENTRIN | 


RETROVIRUS-RELATED POL POLYPROTEIN [CONTAINS: REVERSE TRANSCRIPTASE ; 
ENDONUCLEASE] 


|UI-H-Bl0-aab-e-0&«-Ul.s1 NCI_CGAP_Sub1 Homo sapiens cDNA clone IMAGE:2708825 3' I 


Top Hit 
Database 
Source 


ESTJHUMAN | 


NT | 


NT 1 


ESTJHUMAN | 


ESTJHUMAN | 


SWISSPROT | 


SWISSPROT 


ESTJHUMAN i 


ESTJHUMAN j 


IN 


NT | 


NT 


NT 


NT 


1N| 


EST HUMAN 


SWISSPROT 


EST HUMAN 


EST HUMAN 


ESTJHUMAN 


NT 


2 2 

d : 
i 3 

H h 

CO o 
UJ L 


EST HUMAN 


jSWISSPROT 


SWISSPROT 


ESTJHUMAN 


Top Hit Acesslon 
No. 


AA296652.1 I 


4758179| 


AF140708.1 J 


AU121712.1 j 


AW860963.1 


P08548 ' 


P54296 


! 


1 


U32444.2 j 


U32444.2 j 


AB026898.1 


AB026898.1 


AF020503.1 


X57331.1 | 


AA725700.1 j 


P18715 


BE1 49303.1 


AA405777.1 


AV730373.1 


AJ243213.1 


AI440282.1 


H99646.1 


P48725 


P11369 


AW013847.1 


Most Similar 
(Top) Hit 
BLAST E 
Value 


2.0E-04I 


2.0E-04J 


2.0E-04) 


2.0E-04I 


2.0E-04I 


2.0E-04 


2.0E-04 


2.0E-04I 


2.0E-04) 


2.0E-04I 


2.0E-04 


2.0E-04 


2.0E-04 


2.0E-04 


2.0E-04 


2.0E-04! 


2.0E-04| 


2.0E-04! 


2.0E-04 


2.0E-04 


2.0E-04 


?< 

LU L 
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1.0E-04 


I 1.0E-04 


1.0E-04 


3 

p 


Expression 
Signal 


1.161 


0.88 


00 

d 


•si- 
te 


0.61 


15.1 


1.21 


0.53 1 


0.53] 


2.131 


2.13 


1.21 


1.21 


o> 


o.sel 


0.51 


0.65 


1.21 1 


2.74! 


3.56 


2.59 


IO c 


0.81 


2.03 


2.61 


4.21 


ORF SEQ 
ID NO: 


321171 


32349 


32878 








34215 


34508 


345091 


34667 


34568 


35015 


35016 


35303 


35486 1 


36100 


36170 


36735 


36776 


37683 




1 

38136 


26771 


26956 


27072 


27110 


Exon 
SEQ ID 
NO; 


18933 


00 
CO 

5 


19435 


20407 


20509 


§ 


20835 


21109 


21109 


21257 


21257 


21595 


21595 


21677 


22081 


22642 


22716 


23258 


23300 


24152 


24451 


24572 


13827 


14004 


14121 


14160 


Probe 
SEQ ID 

NO: 


5843 


I 6057 


to 
to 


7440 


I 7546 


7882 


7892 


8170 


8170 


c? 
CM 
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Top Hit Descriptor 


hi37a03.x1 Soares_NFL_T_GBC_S1 Homo sapiens cDNA clone IMAGE:2974444 3' j 


yi59d08.s1 Soares placenta Nb2HP Homo sapiens cDNA clone IMAGE:143535 3' similar to contains Alu j 


! 

1 

s 
1 
i 

c 

j 

\ 

3 
J_ 

5 

s 

1 


Zk58f02.r1 Soares_pregnant_uterus_NbHPU Homo sapiens cDNA clone IMAGE:487035 5' | 


MR0-NT0O38-250400-O01 -f09 NT0038 Homo sapiens cDNA I 


QV4-ST0234-241 1 99-040-h1 1 ST0234 Homo sapiens cDNA I 


Homo sapiens 22kDa peroxisomal membrane protein-like {LOC55895), mRNA J 


Homo sapiens partial SLC22A3 gene for extraneuronal monoamine transporter (EMT), exon 1 I 


Human MLC1 emb gene for embryonic myosin alkaline light chain, 3'UTR j 


AV653544 GLC Homo sapiens cDNA clone GLCDMA06 3' I 


Homo sapiens TESTIN 2 and TESTIN 3 genes, complete cds, alternatively spliced | 


Mus musculus gene for calretinin, exon 1 | 


RETINAL-BINDING PROTEIN (RALBP) | 


RETINAL-BINDING PROTEIN (RALBP) I 


Human renin (REN) gene, 5' flanking region J 


RETINAL-BINDING PROTEIN (RALBP) I 


RETINAL-BINDING PROTEIN (RALBP) I 


Cryptosporidium parvum Isolate Zaire 1 5 kDa glycoprotein gp1 5 gene, partial cds I 


Macaca mulatta haptoglobin (HP) gene, 5' region 1 


Homo sapiens PP1200 mRNA, complete cds J 


RETROVIRUS-REUTED POLPOLYPROTEIN [CONTAINS: REVERSE TRANSCRIPTASE ; 
ENDONUCLEASE] 


BETA-GALACTOSIDASE PRECURSOR (LACTASE) (ACID BETA-GALACTOSIDASE) | 


hl36c07x1 Soares_NFL,T_GBC_S1 Homo sapiens cDNA clone IMAGE;2974380 3' similar to contains 
element MIR repetitive element ; 
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qh64c10.x1 Srares_fetaIJiver_spleen_1NFLS_S1 Homo sapiens cDNA clone IMAGE:1849458 3' similar to 
contains Alu repetitive element;contains element KER repetitive element ; 


co 
o 
o 

CM 
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601461463F1 NIH_MGC_66 Homo sapiens cDNA clone IMAGE:3865142 5* \ 


601461463F1 NIH_MGC_66 Homo sapiens cDNA clone !MAGE:3865142 5' I 


PM1-HT0521-1 20200-001 -e10 HT0521 Homo sapiens cDNA j 


PM1-HT0521-120200-001-e10 HT0521 Homo sapiens cDNA J 


Top Hit 
Database 
Source 


ESTHUMAN | 


EST HUMAN 


EST_HUMAN | 


EST.HUMAN | 


ESTHUMAN | 


NT | 


NT | 


NT I 


ESTHUMAN | 


NT I 


NT | 


SWISSPROT | 


SWISSPROT | 


NT | 


SWISSPROT | 


SWISSPROT | 


NT I 


NT ! 


NT I 


SWISSPROT 


SWISSPROT | 


EST HUMAN 


EST_HUMAN | 


EST HUMAN 


EST HUMAN 


z 
to 

LU 


EST HUMAN 


z 

S 

I 
UJ 


EST HUMAN 


EST HUMAN 


Top Hit Acession 
No. 


AW627985.1 | 


R75639.1 


AA044015.1 


AW890110.1 ! 


AW392086.1 | 


8923891| 


£ 

CO 

lO 

CM 
— ) 

< 


X58855.1 


AV653544.1 ! 


AF260225.1 | 


AB037964.1 | 


P49193 I 


P49193 | 


U12821.1 ! 


P49193 I 


P49193 j 


AF164488.1 i 


U01947.1 


AF202635.1 


P11369 


P23780 I 


AW627946.1 


AW 11 7580.1 


AA41 7756.1 


AI248061.1 


AW 273851.1 


BF037898.1 


BF037898.1 


BE169211.1 


BE1 6921 1.1 


Most Similar 
(Top) Hit 
BLAST E 
Value 


LO 

3 

o 

CD 


6.0E-05 


6.0E-05| 


6.0E-05I 


5.0E-05I 


5.0E-05| 


5.0E-05) 


5.0E-05I 


5.0E-05| 


5.0E-05] 


5.0E-05| 


5.0E-05| 


5.0E-05J 


9 

LU 
o 


4.0E-05! 


4.0E-05I 


4.0E-05| 


4.0E-O5! 


4.0E-05] 


4.0E-05 


4.0E-05! 


4.0E-05 


to 
9 

Ul 
o 


4.0E-05 


3.0E-05 


I 3.0E-05 


I 3.0E-05 


| 3.0E-05 


i 3.0E-05 


[ 3.0E-05 


Expression 
Signal 


0.71 1 


2.27 


2.71 1 


16.081 


16.341 


1.151 


3.54 1 


11.741 


3.22 


0.84 1 


1.181 


5.88 1 


CO 


4.95| 


1.68 


1.681 


3 
d 


0.71 


8.43 


0.51 


0.66 


G) 
ei 


CM 


2.29 


0.78 


s 


0.62 


0.82 


8.15 


8.15 


ORFSEQ 
ID NO; 


36477I 




I 


38316 


315291 


27404 




I 


316701 


32405 


32603 I 










30416 


304171 




33366I 




36760 


371891 


37604 






26671 


I 27057 


27125 


I 27126 


i 30324 


30325 


SEQ ID 
NO: 


23006 


24060 


24730 


258131 


14436 


14905 


170431 


18699 


1 
5> 


1 


205161 


25371 


253711 


13329 


17533 


17533 


17927 


20060 


22834) 


23283 


23692 


24080 


25192 


25612 


CO 


14107 


14176 


14176 


17437 


17437 


Probe 

SEQ ID 
NO: 


10079 


11100 


11847 


1 126701 


1403 


1880 


I 4004 


6603 


i 6107 


1 62921 


! 75531 


12462 


| 127171 


2818 


I 4508 


i 4508 


I 4910 


! 7127 


| 9881 


10360 


10771 


11120 


12423 


13081 


CO 
CO 


1061 


1133 


1133 


CD 

1 


| 4409 



200/546 



WO 01/57276 



PCT/USO 1/00668 





I 

! 

r 

? 


EST79996 Placenta I Homo sapiens cDNA similar to similar to p53-associated protein . | 


EST79996 Placenta I Homo sapiens cDNA similar to similar to p53-assoclated protein j 


Homo sapiens chromosome 21 segment HS21C102 | 


Mus musculus myosin light chain 2, precursor lymphocyte-specific (Mylc2pl), mRNA | 


Homo sapiens SYBL1 gene, exons 6-8 J 


Homo sapiens SYBL1 gene, exons 6-8 j 


601 567451 F1 NIH_MGC_21 Homo sapiens cDNA clone IMAGE:3842292 5' | 


zs60b05.s1 Stratagene schizo brain S1 1 Homo sapiens cDNA clone IMAGE:701 841 3" \ 


hl94e08jc1 NCI_CGAP_Lu24 Homo sapiens cDNA clone IMAGE:3009638 3' | 


Homo sapiens interleukin-1 receptor antagonist homolog 1 (IL1 HY1 ), mRNA j 


MELANOMA-ASSOCIATED ANTIGEN 8 (MAGE-8 ANTIGEN) | 


Human Alu-family cluster 5' of a!pha(1 )-acid glycoprotein gene | 


EST84475 Colon adenocarcinoma IV Homo sapiens cDNA 5' end j 


wg36f09.x1 Soares_NS F_F8_9W_OT_PA_P_S1 Homo sapiens cDNA clone IMAGE:2367209 3' j 


PROTEIN KINASE C-BINDING PROTEIN NELL2 PRECURSOR (NEL-LIKE PROTEIN 2) | 


PROTEIN KINASE C-BINDING PROTEIN NELL2 PRECURSOR (NEL-LIKE PROTEIN 2) j 


Homo sapiens DiGeorge syndrome critical region, centromeric end [ 


qh98e11.x1 Soares_NFL_T_GBC_S1 Homo sapiens cDNA clone 1MAGE:1855052 3' similar to contains 
MER3.b2 MER3 repetitive element ; 


Human adenosine deaminase (ADA) gene, complete cds j 


zq46a12.rl Stratagene hNT neuron (#937233) Homo sapiens cDNA clone IMAGE.632734 5' similar to 
contains Alu repetitive element;contains element L1 repetitive element ; 


RC3-BT031 9-1 20200-01 4-h08 BT0319 Homo sapiens cDNA \ 


Homo sapiens p47-phox (NCF1 ) gene, complete cds j 


H.sapiens DNA for endogenous retroviral like element j 


S.cerevisiae 1 2.8 Kbp fragment of the left arm of chromosome XV | 
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601236455F1 NIH_MGC_44 Homo sapiens cDNA clone IMAGE:3608653 5' ( 


Homo sapiens TNNT1 gene, exons 1-1 1 (and joined CDS) j 


Homo sapiens chromosome 9 duplication of the T ceil receptor beta locus and trypsinogen gene families 


RENAL SODIUM/DICARBOXYLATE COTRANSPORTER (NA(+)/DlCARBOXYLATE 
COTRANSPORTER) 


Top Hit 
Database 
Source 


ESTHUMAN | 


EST_HUMAN | 


NT I 


NT | 


NT | 


NT I 


EST_HUMAN | 


EST_HUMAN | 


EST_HUMAN | 


NT . | 


SWISSPROT | 


NT I 


ESTHUMAN | 


EST_HUMAN | 


SWISSPROT | 


SWISSPROT | 


NT 1 


EST HUMAN 


NT ■ | 


EST HUMAN 


EST_HUMAN | 


NT | 


NT | 


NT I 


EST.HUMAN | 


EST_HUMAN | 


NT I 


NT 


SWISSPROT 


Tod Hit Acession 


No. 


AA368679.1 ( 


AA368679.1 ( 


AL163302.2 | 


11072102| 


AJ225782.1 | 


AJ225782.1 \ 


BE733157.1 | 


AA284049.1 i 


AW770982.1 j 


69124311 


1 

0- 


X03273.1 j 


AA37256Z1 I 


AI769331.1 j 


Q62918 j 
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O 


L77570.1 ! 


AI286021.1 


M13792.1 ! 
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AF1 8461 4.1 | 
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AL0391O7.1 
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AJ011712.1 ! 


AF029308.1 


Q13183 


Most Similar 
(Top) Hit 
BLAST E 
Value 
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2.41 1 


2.41 1 


0* 


1.76| 


co 
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2.46 1 


1.68 1 


1.54| 


1.37| 


0.59| 


0.51| 
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1.32 


CM 
CN 
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ORFSEQ 
ID NO: 


3Q405| 


30406 1 




31895| 


33267| 


332681 


34609| 


35087 1 


35641| 


35644! 


35649 | 
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s 
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28365 


28605[ 




29126 


29331 
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32373 


Exon 
SEQ ID 

NO: 


17518] 


17518| 


176411 


18733; 




199711 


21199| 


21663| 


2221 0| 


222141 


2221 8| 


. 22450| 


22628 I 


22948 | 


238211 


23821) 


251 47 | 


15343 


15588 


15719 


16211 


1 
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16552 
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17740 


18943 


19107 
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1 3359 


[ 3382 


[ 3505 


| 3820 


! 4720 


| 5852 


6024 


6082 



201/546 



WO 01/57276 



PCT/US01/00668 



CD 

3 



S -S Q. 
O rr 



1§ 

§ O 

X CL 

: tu 

_. « 

tL't 



I? i 



, 71 m 



t& > 

q: cd 
o tr 



> 1 



•i £ 

?! . 

f 83: 

i> o 

± > < 



|5rf 



155 



8£ 

PQ 



21 

IS 
Si 



202/546 



WO 01/57276 



PCT/USO 1/00668 



o ^ 

^ p i2 

CD ' 

9 



® 



1 

Q. . 

3 

1 ! 



o 



! | J 

i I i 



8 1 : 



i 1 1 

iii < J 
<3 f 



lei 

til . 

! £ O 
2. to 

j§ 

« Si 
£ 3 



s 8 , 



^1 
O 

fee 

O 



s 5 . 



3* 

8 I 

11 



s !' 

2 = 

II 



203/546 



WO 01/57276 



PCTAJS01/00668 




204/546 



WO 01/57276 



PCT/US01/00668 



3 



! P < 



; z 

X 



•5. 1 

g ; 



S "8 



E 



8 I 



li 
i 

< § 

rSl 

• o a, 
5 * 

if a: 

</> tu 

I| 

si 

CD o i 
O, ft t 

I 

CM 

8 £ 



ll 



o 



I o 6 
i2 to z 



la o 



205/546 



WO 01/57276 



PCT/US01/00668 




206/546 



WO 01/57276 



PCT/US01/00668 




207/546 



WO 01/57276 



PCT/US01/00668 




208/546 



WO 01/57276 



PCT/US01/00668 




209/546 



WO 01/57276 



PCT/US01/00668 




210/546 



WO 01/57276 



PCT/US01/00668 




211/546 



WO 01/57276 



PCT/US01/00668 




212/546 



WO 01/57276 



PCT/US01/00668 




213/546 



WO 01/57276 



PCT/US01/00668 




214/546 



WO 01/57276 



PCT/USO 1/00668 



10 .22 

CM CD 

© H 

CL 



•■i* 



g 

III? 



g 

'w "Hi 
© S 
a- w 



UJ tu z 



-2 Hi 

IS 

_ Hi 

«*> 3 
o a 

CM 0, 
UJ ! 

IS 



jl 



c3 c 



1 1 ! 



:3 ! 



8 ff 



ill 

^ 1 i 



Is 



215/546 



WO 01/57276 



PCT/US01/00668 



£ i2 




216/546 



WO 01/57276 



PCT/US01/00668 




217/546 



WO 01/57276 



PCT/US01/00668 




218/546 



WO 01/57276 



PCT/USO 1/00668 




219/546 



WO 01/57276 



PCT/US01/00668 




220/546 



WO 01/57276 



PCT/US01/00668 



Top Hit Descriptor 


DKFZp434N219_rl 434 (synonym: htes3) Homo sapiens cDNA clone DKFZp434N219 5" | 


HYPOTHETICAL GENE 48 PROTEIN | 


Homo sapiens WRN (WRN) gene, complete cds 1 


601822184F1 NIH_MGC_75 Homo sapiens cDNA clone IMAGE:4042413 5' | 


HYPOTHETICAL 67.9 KD PROTEIN 2K688.8 IN CHROMOSOME III ( 


HYPOTHETICAL 67.9 KD PROTEIN ZK688.8 IN CHROMOSOME III j 


qg09f09o1 Soares_p1acenta__8to9weeks _2NbHP8to9W Homo sapiens cDNA clone IMAGE:1 759049 3' 
similar to contains LTR8.b2 LTR8 repetitive element ; 
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hg58g03.x1 NCI_CGAP_GC6 Homo sapiens cDNA clone IMAGE:2949844 3' similar to contains Alu 
repetitive element; 
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Homo sapiens mannosidase, beta A, lysosomal (MANBA) gene, and ublquitin-conjugating enzyme E2D 3 
(UBE2D3) genes, complete cds 
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aq63h1 1 .x1 Stanley Frontal SN poo! 2 Homo sapiens cDNA clone IMAGE;2035653 { 


PM1-HT0521-1 20200-001 -f08 HT0521 Homo sapiens cDNA j 


PM1-HT0521 -12020 0-00 1-f08 HT0521 Homo sapiens cDNA | 


yy32f06.s1 Soares melanocyte 2NbHM Homo sapiens cDNA clone IMAGE:272963 3' similar to contains ! 
Ll.tl L1 repetitive element ; | 


Homo sapiens extracellular glycoprotein lacritin precursor, gene, complete cds | 


Homo sapiens chromosome 21 segment HS21C003 j 


Homo sapiens chromosome 21 segment HS21C003 I 


yzl 1g0B.s1 Soares_multipie_sclerosis_2NbHMSP Homo sapiens cDNA clone IMAGE:282782 3' I 


RHOMBOID PROTEIN (VEINLET PROTEIN) j 


Id 
o> 
co 

CD 

0 

til 

1 

0 

c 
0 
0 

1 
<3 

IO 

c 
© 
0. 
a 
to 
O 
E 
0 
X 
0 
CN 

O 
O 

X 
2 

I 

CO 
-Q 


AV743302 CB Homo sapiens cDNA clone CBFBGD08 5' j 


AV743302 CB Homo sapiens cDNA clone CBFBGD08 5' | 


ys74b12.s1 Soares retina N2b4HR Homo sapiens cDNA clone IMAGE:22051 1 3* similar to contains MER29 
repetitive element ; 


IL3-CT0219-160200-064-B06 CT0219 Homo sapiens cDNA I 


IL3-CT021 9-1 60200-064-B06 CT021 9 Homo sapiens cDNA | 


Top Hit 


Database 
Source 


EST_HUMAN | 


SWISSPROT | 


NT ! 


EST_HUMAN | 


SWISSPROT | 


SWISSPROT | 


ESTJHUMAN 


EST.HUMAN | 


EST_HUMAN 


NT I 


NT 


EST HUMAN 


ESTJHUMAN | 


ESTJHUMAN | 


EST HUMAN | 


EST HUMAN 


IN | 


NT 


NT 
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SWISSPROT 


EST HUMAN 


EST HUMAN 
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ESTJHUMAN 
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TopHitAcesslon 
No. 


AL046804.1 } 


Q01033 | 


AF1 81 897,1 I 


BF105159.1 [ 


1 
I 


P34678 | 


AI221 083.1 


AA51 5260.1 | 


AW594709.1 


AL163303.2 | 
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1 

SI 
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< 


AW293243.1 S 


AI267342.1 j 


BE169208.1 I 
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O) 
CD 

UJ 
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N36113.1 


AY005150.1 | 


AL163203.2 


AL163203.2 


N50109.1 


P20350 


BE302970.1 


AV74330Z1 


AV743302.1 


H87208.1 


AW850731.1 


AW850731.1 


Most Similar 
(Top) Hit 
BLAST E 
Value 
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Signal 


5.01 1 


1.63| 




1.74 1 


1.95 1 


1.951 


1.27 


0.75I 




5.49 1 


17.71 


0.53 1 


0.87 1 


0.45I 
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CO 
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1.041 
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4.03 1 


3.08 1 
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1.42! 
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ORF SEQ 
ID NO: 






30923] 




36303 | 


383041 




26565I 


28043 


28600 | 


33703 


369711 


37235I 


37388 1 


37369 | 


26924 




30480 1 


304811 


315641 


32639| 


32801 1 


34367! 


34368! 


35471 1 


35803! 


35804 


Exon 
SEQ ID 
NO: 


13821 


165351 


180401 


20507! 


22847| 


228471 


13222 


13851 


15033 


15581 


20352 


23477; 


23733 I 


23853 1 


23853! 


13970 


14388 


17589 


17589 


18628 


19397 


19551 


20973 


20973 


22048 


22369 


22369 


Probe 


SEQ ID 
NO: 




1 34891 


1 5026] 


| 7544 


I 9894J 
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nk01b10.s1 NCI_CGAP_Pr1 1 Homo sapiens cDNA clone IMAGE:1000699 similar to gb:M17886 60S 
ACIDIC RIBOSOMAL PROTEIN P1 (HUMAN); 
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Top Hit Desoriplor 


Human nucleolar protein (B23) mRNA, complete cds 


Homo sapiens chromosome 21 segment HS21C103 


602121491 F1 NIH_MGC_56 Homo sapiens cDNA clone IMAGE:4278527 5* 
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Mus musculus sperm tail associated protein (Stap), mRNA 


Homo sapiens chromosome 21 segment HS21C009 


QV0-OT0033-O70300-152-b10 OT0033 Homo sapiens cDNA 


H.saplens DNA for endogenous retroviral like element 


Rrattus RYA3 mRNA for a potential ligand-blndlng protein 


i 

a 

0 

CO 

8 

1 

O 

E 
0 
X 

& 

Q 

*? 
O 

I 

O 
o> 

9 

CO 

6 
5 
Cl 


7B44C08 Chromosome 7 Fetal Brain cDNA Library Homo sapiens cDNA clone 7B44C08 


601458531 F1 NlH_MGC_66 Homo sapiens cDNA clone IMAGE:3862086 5" 


Homo sapiens alpha NAC mRNA, complete cds 


hl51h12.x1 Soares_NFL_T_GBC_S1 Homo sapiens cDNA clone IMAGE:2975879 3' simile 
076040 ORF2: FUNCTION UNKNOWN. ; 


Homo sapiens jun dimerization protein gene, partial cds; cfos gene, complete cds; and unkr 


Homo sapiens jun dimerization protein gene, partial cds; cfos gene, complete cds; and unki 


yj36e01.M Soares placenta Nb2HP Homo sapiens cDNA clone IMAGE:1 50840 5' similar tt 
SP:HMGCJ/!OUSEQ02591 HOMEOBOX PROTEIN ; 


w!28g07jt1 NCI_CGAPJJti Homo sapiens cDNA clone IMAGE:2426268 3' 


nh08h05.s1 NCLCGAP_Thy1 Homo sapiens cDNA clone IMAGE:943737 similar to conta 
repetitive element ; 


Rrattus RYA3 mRNA for a potential ligand-blndlng protein 


EST00738 Feta! brain, Stratagene (cat#936206) Homo sapiens cDNA clone HFBCF07 


EST00738 Fetal brain, Stratagene (cat#936206) Homo sapiens cDNA clone HFBCF07 


AU121685 MAMMA1 Homo sapiens cDNA clone MAMMA1 000746 5" 


nk01b10.s1 NCI_CGAP_Pr1 1 Homo sapiens cDNA clone IMAGE:1000699 similar to gb:W 
ACIDIC RIBOSOMAL PROTEIN P1 (HUMAN); 
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Top Hit Descriptor 


ht09g01.x1 NCI_CGAP_Kid13 Homo sapiens cDNA clone IMAGE:3146256 3' similar to contains MER29.b3 
MER29 repetitive element ; 


Homo sapiero envelope protein RIC-6 (env) gene, complete cds { 


Homo sapiens envelope protein RIC-6 (env) gene, complete cds J 


wr65d10.x1 NCI CGAP Ut1 Homo sapiens cDNA clone IMAGE:2492563 3' similar to TR:0 15546 015546 
HERV-E ENVELOPE GLYCOPROTEIN ; 


wr65d10jd NCI_CGAP_Ut1 Homo sapiens cDNA clone 1MAGE:2492563 3' similar to TR:015546 015546 
HERV-E ENVELOPE GLYCOPROTEIN ; 


H omo sapiens chromosome 21 segment HS21 C068 I 


POTENTIAL PHOSPHOLIPID-TRANSPORTING ATPASE VA 1 


os71e04.xl NCI_CGAP_GC2 Homo sapiens cDNA clone IMAGE:1610814 3' similar to contains L1.t2 L1 
repetitive element ; 


wf27g07.x1 Soares_NFL_T_GBC_S1 Homo sapiens cDNA clone IMAGE:2356860 3' similar to contains 
element MER6 repetitive dement ; 


wf27g07.x1 Soares_NFL_T_GBC_S1 Homo sapiens cDNA clone IMAGE:2356860 3' similar to contains 
element MER6 repetitive element ; 


601442206F1 NIH_MGC_65 Homo sapiens cDNA clone IMAGE:3846648 5' | 


Homo sapiens DNA-blnding protein (LOC56242). mRNA | 


Homo sapiens DNA-binding protein (LOC56242), mRNA | 


Homo sapiens chromosome 21 segment HS21 C048 I 


Homo sapfens chromosome 21 segment HS21C048 I 


Homo sapiens chromosome 21 segment HS21 C048 I 


Homo sapiens chromosome 21 segment HS21C048 j 


Homo sapiens splicing factor similar to dnaJ (SPF31), mRNA j 


QVO-OT0032-080300-155-d01 OT0032 Homo sapiens cDNA j 


RC1-HN0003-220300-021-b04 HN0003 Homo sapiens cDNA | 


R.rattus RYA3 mRNA for a potential ligand-binding protein I 


nz20c07.e1 NCI_CGAP_GCB1 Homo sapiens cDNA clone IMAGE:1288332 3" similar to contains MER4.b1 
MER4 repetitive element ; 


Homo sapiens zinc/iron regulated transporter-like (ZIRTL), mRNA | 


HSC23F051 normalized infant brain cDNA Homo sapiens cDNA clone c-23f05 | 


EST9731 7 Thymus I Homo 3aplens cDNA 5' end similar to EST containing O family repeat ] 
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Top Hit Descriptor 


]PM4-BT0724-150400-004-d11 BT0724 Homo sapiens cDNA j 


Human lambda-immunogtobulin constant region complex (germline) | 


Human mRNA for integrtn alpha subunit, complete cds | 


QV0-BN01 47-290400-21 4-f 12 BN0147 Homo sapiens cDNA | 


(QV0-BN0 147-290400-21 4-f 12 BN0147 Homo sapiens cDNA | 


Homo sapiens CTCL tumor antigen Se20-10 mRNA, partial cds | 


Human lambda-immunoglobulin constant region complex (germline) | 


tg92g03.x1 NCI_CGAP_CLL1 Homo sapiens cDNA clone |MAGE:21 16276 3' similar to contains Alu 
repetitive element; 


| Human aconltate hydratase (AC02) gene, exon 7 ] 


| Homo sapiens chromosome 21 segment HS21C078 j 


[Homo sapiens chromosome 21 segment HS21C010 J 


|QV3~DT0043-090200-080-c06 DT0043 Homo sapiens cDNA | 
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RETROVIRUS-RELATED POL POLYPROTEIN [CONTAINS: REVERSE TRANSCRIPTASE ; 
ENDONUCLEASE] 


|CM1-ST0181-091 199-035-f08 ST0181 Homo sapiens cDNA | 


qq93c05.x1 Soare3_total_fetus_Nb2HF8_9w Homo sapiens cDNA cfone IMAGE:1938920 3' similar to 
contains MER29.b2 MER29 repetitive element ; 


|Homo sapiens teiomerase reverse transcriptase (TERT) gene, exons 1-6 j 


| b1 2056t Test's 1 Homo sapiens cDNA clone b1 2056 \ 


| Rattus norvegicus putative four repeat ion channel mRNA, complete cds | 


| Rattus norvegicus putative four repeat ion channel mRNA, complete cds j 


ht09g01.x1 NC1_CGAP J<ld13 Homo sapiens cDNA clone IMA GE:3 146256 3' similar to contains MER29.b3 
MER29 repetitive element ; 


j Homo sapiens mRNA for KIAA1 1 43 protein, partial cds i 


| Homo sapiens mRNA for KIAA1 143 protein, partial cds | 


|TRANSCRlPTION FACTOR AP-2 | 


|CMO-CT0307-310100-1 58-h03 CT0307 Homo sapiens cDNA | 


|HSC23F051 normalized infant brain cDNA Homo sapiens cDNA clone c-23f05 \ 


|RC5-HT0582-11 0400-01 3-H08 HT0582 Homo sapiens cDNA j 


| IL2-NT01 01-280700-1 16-E04 NT0101 Homo sapiens cDNA J 


Homo sapiens Y-linked zinc finger protein (ZFY) gene, complete cds j 
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htD9g01 .x1 NCl_CGAPJ<ld1 3 Homo sapiens cDNA clone IMAGE:3146256 3' similar to contains MER29.b3 
MER29 repetitive element ; 


htD9g01.x1 NCl_CGAP_Kid13 Homo sapiens cDNA clone IMAGE:31 46256 3' similar to contains MER29.63 
MER29 repetitive element ; 
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H.sapiens DMA, DMB, HLA-Z1, IPP2, LMP2, TAP1, LMP7, TAP2, DOB, DQB2and RINGS, 9, 13 and 14 
genes 


jDKFZp434E0422ji 434 (synonym: htes3) Homo sapiens cDNA clone DKFZp434E0422 5' j 


Homo sapiens jun dimerizatlon protein gene, partial cds; cfos gene, complete cds; and unknown gene 


Homo sapiens jun dimerization protein gene, partial cds; cfos gene, complete cds; and unknown gene 


[EST380899 MAGE resequences, MAGJ Homo sapiens cDNA | 


Wk25b11.x1 NCI_CGAP_Bm25 Homo sapiens cDNA done IMAGE:2413341 3' similar to contains PTR5.t2 
PTR5 repetitive element ; 


tm87g03j<1 NCI_CGAP_Brn25 Homo sapiens cDNA done IMAGE:2165140 3' similar to contains L1 .b3 Li 
repetitive element ; 


| Homo sapiens protocadherin alpha 10 alternate isoform (PCDH-alpha10) mRNA, complete cds j 


|Homo sapiens Sad1 unc-84 domain protein 2 (SUN2) mRNA, partial cds ] 


| EST1 78035 Colon carcinoma (HCC) ceil line Homo sapiens cDNA 5' end | 


|EST1 78035 Colon carcinoma (HCC) cdl line Homo sapiens cDNA 5* end | 


|AV750211 NPC Homo sapiens cDNA done NPCBGH09 5' 1 


{Homo sapiens glycine C-acetyi transferase (2-amino-3-ketobutyrate-CoA ligase) (GCAT), mRNA j 


|Homo sapiens NOD1 protein (NOD1) gene, exons 1, 2, and 3 | 
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| Human endogenous retroviral DNA (4-1 ), complete retroviral segment j 


| EST52g1 0 WATM1 Homo sapiens cDN A clone 52g1 0 similar to human STS G04101 j 
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Human ribosomal protein L23a mRNA, complete cds j 


FB 1 G5 Fetal brain, Stratagene Homo sapiens cDNA clone FB1 G5 3'end similar to LINE-1 I 


Homo sapiens Ras-like GTP-binding protein (RAB27A) gene, exons 1 b and 2 \ 


Homo sapiens Ras-like GTP-blnding protein (RAB27A) gene, exons 1 b and 2 | 


Homo sapiens chromosome 21 segment HS21 CQ84 | 


60202231 3F1 NCI_CGAP_Brn67 Homo sapiens cDNA clone IMAGE:41 57668 5' | 


Homo sapiens pyruvate dehydrogenase kinase, isoenzyme 3 (PDK3) mRNA | 


Homo sapiens Sp4 transcription factor (SP4) mRNA } 


Homo sapiens Sp4 transcription factor (SP4) mRNA ) 
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SP:BD38_MOUSE P28653 BRAIN PROTEIN DN38 ; 


Homo sapiens vacuolar sorting protein 35 (VPS35) mRNA, complete cds J 


Homo sapiens 8q22. 1 region and MTG8 (CBFA2T1 ) gene, partial cds j 
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Homo sapiens calcium channel, voltage-dependent, alpha 1E subunlt (CACNA1E), mRNA 
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wb99b04.x1 NCI_CGAP_Pr28 Homo sapiens cDNA clone 1MAGE:23 13775 3" j 


Homo sapiens cadherin EGF LAG seven-pass G-type receptor 1 (CELSR1), mRNA 
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H. sapiens DNA for Cone cGMP-PDE gene J 


Homo sapiens small proline-rioh protein 2C (SPRR2C), mRNA | 


Homo sapiens small proline-rich protein 2C (SPRR2C), mRNA \ 


Homo sapiens mRNA for thymidine kinase, partial | 


Homo sapiens myosin mRNA, partial cds ) 


Homo sapiens polymerase (RNA) II (DNA directed) polypeptide F (POLR2F), mRNA | 


Homo sapiens putative nuciear protein (HRIHFB2122), mRNA | 


Homo sapiens protein kinase C, alpha binding protein (PRKCABP), mRNA \ 


Homo sapiens putative nuclear protein (HRIHFB2122), mRNA j 
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at75h09od Barstead colon HPLRB7 Homo sapiens cDNA clone IMAGE:2377889 3' similar to TR:O60844 
060844 HOMOLOG OF RAT ZYMOGEN GRANULE MEMBRANE PROTEIN. ; 


AU123240 NT2RM1 Homo sapiens cDNA clone NT2RM1000978 5* j 


601310479F1 NIH_MGC_44 Homo sapiens cDNA clone IMAGE:3632083 5' j 


Homo sapiens aminoacylase 1 (ACY1), mRNA 


hk61b03Jd NCI_CGAP_Lym12 Homo sapiens cDNA clone IMAGE:3001133 3' similar to gb:X64707 
BREAST BASIC CONSERVED PROTEIN 1 (HUMAN); 


hk61b03jcl NCI_CGAPJ-ym12 Homo sapiens cDNA clone 1MAGE:3001133 3' similar to gb:X64707 
BREAST BASIC CONSERVED PROTEIN 1 (HUMAN); 


Homo sapiens mRNA for KIAA1209 protein, partial cds \ 
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Homo sapiens tousled-like kinase 1 (TLK1 ), mRMA | 


Homo sapiens SET domain and mariner transposase fusion gene (SETMAR) mRNA | 


Homo sapiens histidyl-tRNA synthetase (HARS), mRNA j 


CO 
CO 

CM 

ui 

1 

<D 

a 
o 
o 
< 
Z 
Q 
o 
to 
a> 

t 

o 

i 

X 

CM 

o 
z 

CO 

o> 

1 


Homo sapiens mRNA for AIE-75, complete cds j 


Homo sapiens BMX non-receptor tyrosine kinase (BMX), mRNA j 


Homo sapiens mRNA for KIAA1624 protein, partial cds / 


Homo sapiens mRNA for KIAA1 624 protein, partial cds j 
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contains Alu repetitive element; 
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Homo eapiene serologically defined cotan cancer antigen 10 (SDCCAG10), mRNA | 


Homo sapiens pescadillo (zebrafish) homolog 1 , containing BRCT domain (PES1 ), mRNA j 


601899230F1 NIH_MGC_19 Homo sapiens cDNA clone IMAGE:4128535 5' j 


Homo sapiens similar to nuclear factor related to kappa B binding protein (H. sapiens) (LOC631 82), mRNA 


zu1 0e09.r1 Soares JestisJMHT Homo sapiens cDNA clone IMAGE:731 464 5* j 


LO 
CD 

1 

© 

g 

o 
< 
z 
o 

o 
ta 

! 

o 
E 
o 
X 
1- 
X 

z 

to 

I 

CO 
CD 

3 

cn 
a> 
§ 

a 


AU077341 Sugano cDNA library Homo sapiens cDNA clone Zrv6C880 simiiar to 5'-end region of Human 
gamma-glutamyi transpeptidase mRNA, 5 end 


QV2-BT0635-1 60400-1 43-M 2 BT0635 Homo sapiens cDNA | 


Homo sapiens RFB30 gene for RING finger protein | 


Homo sapiens RFB30 gene for RING finger protein \ 


fh02a02.x1 NIH_MGC_17 Homo sapiens cDNA clone IMAGE:2960907 5' \ 


hw08d06.x1 NCl_CGAP Lu24 Homo sapiens cDNA clone IMAGE:3182315 3' similar to TR:Q9Z1 J8 
Q9Z1J8 45 KDA SECRETORY PROTEIN ; 


yf26e04.r1 Soares fetal liver spleen 1NFLS Homo sapiens cDNA clone IMAGE:127998 5' similar to 
SP:C561_BOVlN P10897 CYTOCHROME ; 
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yq78d03.r1 Soares fetal liver spleen 1NFLS Homo sapiens cDNA clone IMAGE:201893 5" 


ym57g07.rt Soares infant brain 1 NIB Homo sapiens cDNA clone IMAGE:52444 5' | 


Homo sapiens mRNA for KIAA1 501 protein, partial cds j 
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Homo sapiens arylsulfatase E (chondrodysplasia punctata 1 ) (ARSE), mRNA \ 


Homo sapiens Rho GTPase activating protein 6 (ARHGAP6), transcript variant 5, mRNA j 


Homo sapiens speckle-type POZ protein (SPOP), mRNA j 
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Homo sapiens protein tyrosine phosphatase, receptor type, alpha polypeptide (PTPRA) mRNA \ 
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Top Hit Descriptor 


Homo sapiens sparc/osteonectin, cwcv and kazel-like domains proteoglycan (testican) (SPOCK) mRNA 


Homo sapiens sparc/osteonectin, cwcv and kazaMike domains proteoglycan (testican) (SPOCK) mRNA 


Homo sapiens bone morphogenetic protein 5 (BMP5), mRNA 


Homo sapiens KIAA0317 gene product (KIAA0317), mRNA j 


Homo sapiens nuclear pore complex Interacting protein (NPIP), mRNA j 


Homo sapiens nuclear pore complex Interacting protein (NPIP), mRNA j 


Homo sapiens NACP/alpha-synuclein gene, exon 5 j 


Homo sapiens NACP/alpha-synuclein gene, exon 5 j 


Homo sapiens caveolin 3 (CAV3), mRNA j 


Homo sapiens caveolin 3 (CAV3), mRNA J 


zq52a08.s1 Stratagene neuroepithelium (#937231 ) Homo sapiens cDNA clone IMAGE:645206 3' | 


RC4-BT0310-1 10300-015-f10 BT0310 Homo sapiens cDNA | 


RC4-BT0310-1 10300-01 5-f10 BT0310 Homo sapiens cDNA | 


Human cGMP phosphodiesterase alpha subunit (CGPR-A) mRNA, complete cds J 


Human cGMP phosphodiesterase alpha subunit (CGPR-A) mRNA, complete cds j 


Homo sapiens mRNA for KIAA1414 protein, partial cds j 


Homo sapiens gene for activin receptor type IIB, complete cds | 


AV703184 ADB Homo sapiens cDNA clone ADBCFG10 5* I 


Homo sapiens SET domain and mariner transposase fusion gene (SETMAR) mRNA j 


Macaca fasclcularis protein tyrosine phosphatase (PRL-1 ) mRNA, complete cds j 


hg23c1 1 ,x1 NCLCGAP_GC6 Homo sapiens cDNA clone IMAGE:2946452 3' [ 


hg23c1 1 xl NCI_CGAP_GC6 Homo sapiens cDNA clone IMAGE:2946452 3' j 


QV-BT077-130199-079 BT077 Homo sapfens cDNA | 
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RC2-CT01 63-220999-00 1-E02 CT0163 Homo sapiens cDNA j 
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Homo sapiens serine protease 1 7 (KLK4) gene, complete cds j 


Homo sapiens serine protease 17 (KLK4) gene, complete cds j 


Homo sapiens mRNA for cyclln B2, complete cds ] 
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7f33b10.x1 NC!_CGAP_CLL1 Homo sapiens cDNA clone IMAGE:329S443 3' similar to WP:Y47H9C2 
CE20263 ; 


7f33b10.x1 NCI_CGAP_CLL1 Homo sapiens cDNA clone IMAGE:3296443 3' similar to WP:Y47H9C.2 
CE20263 ; 


RC3-CT0254-1 103CO-027-d10 CT0254 Homo sapiens cDNA | 


Homo sapiens angiotensin 1 converting enzyme (peplidyl-dipeptidase A) 2 (ACE2), mRNA j 


601589896F1 NIH_MGC_7 Homo sapiens cDNA clone IMAGE:3S44302 5' I 


42f6 Human retina cDNA randomly primed sublibrary Homo sapiens cDNA j 


Homo sapiens hypothetical protein FLJ1 1 656 (FLJ11 656), mRNA \ 


Homo sapiens hypothetical protein FLJ11656 (FU11656), mRNA | 


Homo sapiens KIAA0649 gene product (KIAA0649), mRNA j 


Human famesyl pyrophosphate synthetase mRNA, complete cds ' | 


AU 11 7659 HEMBA1 Homo sapiens cDNA clone HEMBA1 001 91 0 5' j 


Homo sapiens hypothetical protein FLJ11656 (FLJ11656), mRNA j 


Homo sapiens hypothetical protein FUM 1 656 (FLJ11656), mRNA j 
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Homo sapiens SNARE protein Kinase SNAK mRNA, complete cds | 


Homo sapiens chromosome 21 segment HS21 C004 | 


ye98h01 .r1 Soares felal liver spleen 1NFLS Homo sapiens cDNA clone IMAGE:125809 5" j 
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Homo sapiens chromosome 21 segment HS21 C083 [ 


Homo sapiens chromosome 21 segment HS21C006 | 


re31c05.M Soares retina N2b4HR Homo sapiens cDNA clone. IMAGE:360584 5" similar to contains L1.t3 L1 
repetitive element ; 


7n80f04.x1 NCL_CGAP_Ov18 Homo sapiens cDNA clone IMAGE:3570966 3' similar to contains TAR1 .tl 
MER22 repetitive element ; 
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Homo sapiens placenta-specific 1 (PLAC1 ), mRNA | 
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ab07h04.r1 Stratagene lung (#937210) Homo sapiens cDNA clone IMAGE:840151 5' similar to contains 
LTR10.M LTR10 repetitive element; 


Homo sapiens solute carrier (SLC25A18) mRNA, complete cds; nuclear gene for mitochondrial product 


|Human bcr protein mRNA, 5' end j 


Homo sapiens solute carrier (SLC25A18) mRNA, complete cds; nuclear gene for mitochondrial product 
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nn01f12.y5 NCI_CGAP_Co9 Homo sapiens cDNA clone IMAGE:1076495 5' similar to contains THR.tl THR 
repetitive element; 


| Homo sapiens pro-alpha 2(1) collagen (COL1 A2) gene, complete cds | 
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| Homo sapiens corticotropin releasing hormone receptor 2 (CRHR2) mRNA } 
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| HS1 5BEST human adult testis Homo sapiens cDNA clone CAMJEST1 5 ] 


Human pre-B cell stimulating factor homologue (SDF1 b) mRNA, complete cds j 


Homo sapiens sema domain, transmembrane domain (TM), and cytoplasmic domain, (semaphorin) 6A 
(SEMA6A), mRNA 


Homo sapiens sema domain, transmembrane domain (TM), and cytoplasmic domain, (semaphorin) 6A 
(SEMA6A), mRNA 
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Homo sapiens Down syndrome candidate region 1 (DSCR1 ), mRNA I 
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Homo sapiens jun dimerlzation protein gene, partial cds; cfos gene, complete cds; and unknown gene 


EST380820 MAGE resequences, MAGJ Homo sapiens cDNA | 


Homo sapiens Ran GTPas© activating protein 1 (RANGAP1 ), mRNA j 


au75d02.x1 Schneider fetal brain 00004 Homo sapiens cDNA clone IMAGE:2782083 3' similar to gb:M37104 
ATP SYNTHASE COUPLING FACTOR 6, MITOCHONDRIAL PRECURSOR (HUMAN): 


EST96812 Testis I Homo sapiens cDNA 5' end similar to similar to C. elegans hypothet'caJ protein, cosmid 
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Homo sapiens Inositol 1 ,3,4-triphosphate 5/6 kinase (ITPK1), mRNA 


au75d02.x1 Schneider fetal brain 00004 Homo sapiens cDNA clone IMAGE:2782083 3" similar to gb:M37104 
ATP SYNTHASE COUPLING FACTOR 6, MITOCHONDRIAL PRECURSOR (HUMAN); 


Homo sapiens zinc finger protein 304 (2NF304), mRNA J 


Homo sapiens adaptor-related protein complex 2, beta 1 subunit (AP2B1 ), mRNA j 


Homo eapiens adaptor-related protein complex 2, beta 1 subunit (AP2B1), mRNA J 


Homo sapiens ATPase, H+ transporting, lysosomal (vacuolar proton pump) non-catalytic accessory protein 
1A (110/116kD)(ATP6N1A), mRNA 


Homo sapiens mitochondrial carrier family protein (LOC55972), mRNA | 


Homo sapiens mitochondrial carrier family protein (LOC55972), mRNA I 


Homo sapiens latent transforming growth factor beta binding protein 2 (LTBP2) mRNA \ 


Homo sapiens retinaldehyde dehydrogenase 2 (RALDH2), mRNA 1 


Human cytochrome oxidase subunit Via (COX6A1P) peeudogene, complete cds | 


Homo sapiens low density lipoprotein-reiated protein Z (LRP2), mRNA f 


Homo sapiens low density lipoprotein-reiated protein 2 (iRP2), mRNA j 


Homo sapiens gene for AF-6, complete cds | 


Homo sapiens calcium channel, voltage-dependent, alpha 2/delta subunit 1 (CACNA2D1 ), mRNA l| 
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Homo sapiens mRNA for transmebrane receptor protein - } 


Homo sapiens PMP69 gene, exons 3,4,5,6 & 7 j 


Homo sapiens retinoblastoma i (including osteosarcoma) (RB1) mRNA J 


Homo sapiens Synapsin 111 (SYN3) mRNA, and translated products ( 
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Top Hit Descriptor 


Homo sapiens beta-tubulin mRNA, complete cds 


Homo sapiens beta-tubulin mRNA, complete cds 


QV-BT073-1 91 298-01 2 BT073 Homo saptens cDNA | 


QV-BT073-1 91 298-01 2 BT073 Homo sapiens cDNA j 


EST38071 1 MAOE resequences, MAG J Homo sapiens cDNA j 


tm69h07.x1 NCI CGAP Brn25 Homo sapiens cDNA clone IMAGE:21 63421 3" similar to SW:BID_HUMAN 
P55957 BH3 INTERACTING DOMAIN DEATH AGONIST ; 


tm69h07.x1 NCI_CGAP_Brn25 Homo sapiens cDNA clone IMAGE:21 63421 3' similar to SWBIDJHUMAN 
P55957 BH3 INTERACTING DOMAIN DEATH AGONIST ; 


PM2-MT0037-250700-003-G04 MT0037 Homo sapiens cDNA | 


zn90d02.r1 Stratageno lung carcinoma 937218 Homo sapiens cDNA clone IMAGE:565443 5' similar to 
TR:G662994 G662994 GPl-ANCHORED PROTEIN P137. ; 


Human endogenous retrovirus, complete genome j 


Homo sapiens oscillin (hLn) gene, exon 5 | 


Homo sapiens NK-receptor (KIR-G2) gene, linker region exon | 


Human G2 protein mRNA, partial cds j 


Homo sapiens CD34 antigen (CD34) mRNA | 


Homo sapiens GAP-llke protein (LOC51306), mRNA | 


Homo sapiens polycystic kidney disease (PKD1 ) gene, exons 27-30 I 


Homo sapiens polycystic kidney disease (PKD1 ) gene, exons 27-30 j 


H .sapiens mRNA for estrogen receptor j 


Homo sapiens ankyrin-like with transmembrane domains 1 (ANKTM1 ), mRNA I 


Homo sapiens NDST4 mRNA for N-deacetylase/N-sulfotransferase 4, complete cds j 


Homo sapiens lodestar protein mRNA, complete cds j 


Homo sapiens lodestar protein mRNA, complete cds 


Homo sapiens inositol 1 ,4,5-triphosphate receptor, type 1 (ITPR1), mRNA j 


Homo sapiens BH3 Interacting domain death agonist (BID), mRNA | 
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Homo sapiens UDP-glucose:glycoprotein glucosyltransferase 1 (HUGT1), mRNA \ 


H . sapiens 1 MPA gene, exon 8 ] 


Homo sapiens T cell receptor beta locus, TCRBV7S3A2 to TCRBV12S2 region j 


601513157F1.NIH_MGCJT1 Homo sapiens cDNA clone IMAGB3914391 5* J 


Human E2A/HLA fusion protein (E2A/HLF) mRNA, complete cds j 
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Top Hit Descriptor 


601 10821 9F1 NIH_MGCj16 Homo sapiens cDNA clone IMAGE:3349997 5' | 
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601 433087F1 NIH_MGC_72 Homo sapiens cDNA clone IMAGE:391 8524 5' j 


RC1-CT0249-090800-024-d05 CT0249 Homo sapiens cDNA 
Homo sapiens Xq pseudoautosomal region; segment 1/2 


Human IFNAR gene for interferon alpha/beta receptor 
Homo sapiens NY-REN-25 antigen mRNA. partial cds 


Human IFNAR gene for interferon alpha/beta receptor 


Homo sapiens sodium-dependent high-affinfty dlcarboxylate transporter (NADC3) mRNA, complete cds 


Homo sapiens BA21B mRNA for bromodomaln adjacent to zinc finger domain 1B, complete cds | 


QV2-HT0540-1 20900-358-a05 HT0540 Homo sapiens cDN A | 


Homo sapiens cathepsJn Z precursor (CTSZ) gene, exon 3 j 
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Homo sapiens mRNA for KIAA0453 protein, partial cds | 


PM1-CN0031 -1901 00-001 -d03 CN0031 Homo sapiens cDNA | 


PM1-CN0031-190100-001-d03 CN0031 Homo sapiens cDNA j 
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601567619F1 NIH_MGC_21 Homo sapiens cDNA clone IMAGE:3842309 5' j 


PM1-CN0031 -1901 00-001 -d03 CN0031 Homo sapiens cDNA | 


PM1-CN0031-190100-001-d03 CN0031 Homo sapiens cDNA | 


Homo sapiens SMT3 (suppressor of mif two 3, yeast) homolog 2 (SMT3H2), mRNA | 


Homo sapiens myotubularin (MTM1 ) gene, exon 9 j 


EST381 116 MAGE resequences, MAGK Homo sapiens cDN A j 


601442558F1 NIH_MGC_65 Homo sapiens cDNA clone IMAGE:3846494 5' j 


Homo sapiens A kinase (PRKA) anchor protein 10 (AKAP10), mRNA ] 


Homo sapiens general transcription factor IIIC, polypeptide 1 (alpha subunit, 220kD ) (GTF3C1 ), mRNA 


Homo sapiens general transcription factor IIIC, polypeptide 1 (alpha subunit, 220kD ) (GTF3C1 ), mRNA 
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Top Hit Descriptor 


nu93c12s1 NCI_CGAP_Pr22 Homo sapiens cDNA clone IMAGE:1218262 3' similar to SW:GTT2_HUMAN 
P30712 GLUTHATHIONE S-TRANS FERASE THETA 2 ; 


Homo sapiens guanylate cyclase activator 1 A (retina) (GUCA1 A) mRNA j 


Homo sapiens KIAA0377 gene product (KIAA0377), mRNA j 


ya48e06.r1 Soares infant brain 1NIB Homo sapiens cDNA clone IMAGE:53057 5' 


AU137282 PLACE1 Homo sapiens cDNA clone PLACE1006159 5' j 


602136446F1 NIH_MGC_83 Homo sapiens cDNA clone IMAGE:4272922 5' j 


Homo sapiens placental protein 1 1 (serine proteinase) (P1 1 ) mRNA | 


RC1-HT061 5-200400-022-d04 HT061 5 Homo sapiens cDNA | 


CM1-UT0038-060900-399-h07 UT0038 Homo sapiens cDNA j 
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Human Interteukin 4 (IL-4) gene, complete cds j 


Human interleukin 4 (IL-4) gene, complete cds J 


qg86h08.x1 Soares JJFL_T_GBC_S1 Homo sapiens cDNA clone IMAGE:18421 11 3" | 


Homo sapiens myosin, heavy polypeptide 4, skeletal musde (MYH4), mRNA 1 


RC5-BT0580-1 7030D-021-F08 BT0580 Homo sapiens cDNA | 


Homo sapiens mRNA tor KIAA1 591 protein, partial cds j 


Homo sapiens AT-binding transcription factor 1 (ATBF1 ), mRNA | 


601809495F1 NIH_MGC_18 Homo sapiens cDNA clone IMAGE:4040279 5* | 


601809495F1 NIH_MGC_18 Homo sapiens cDNA clone IMAGE:4040279 5* J 


Novel human gene mapping to chomosome 1 3 | 


PMO-8T0340-091299-002-e05 BT0340 Homo sapiens cDNA ) 


7B18H01 Chrompsome 7 Fetal Brain cDNA LibraryHomo sapiens cDNA clone 7B18H01 [ 


60147941 7F1 NIH_MGC_68 Homo sapiens cDNA clone IMAGE:3882124 5* j 


60147941 7F1 NIH _MGC_68 Homo sapiens cDNA clone 1MAGE:3882124 6' j 
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601 289760F1 NIH_MGCJ3 Homo sapiens cDNA clone IMAGE:3620030 5" i 


601 289760F1 NIH_MGC_8 Homo sapiens cDNA clone IMAGE:3620030 5' | 


HSC1EC121 normalized Infant brain cDNA Homo sapiens cDNA clone c-1ec12 ] 


601063030F1 NIH.MGCJ0 Homo sapiens cDNA clone IMAGE:3449599 5" j 


601063030F1 NIHJVIGCJO Homo sapiens cDNA clone !MAGE:3449599 5' | 
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Top Hit Descriptor 


Homo sapiens ATP-binding cassette, sub-family B (MDR/TAP), member 4 (ABCB4), iranscript variant B, 
mRNA 


Homo sapiens ATP-blndlng cassette, sub-family B (MDR/TAP), member 4 (ABCB4), transcript variant B, 
mRNA 


Homo 3aplens glutamate receptor, ionotropic, N-methyl D-aspartate 2A (GRIN2A) mRNA j 


Homo sapiens glutamate receptor, lonotropic, N-methyl D-aspartate 2A (GR1N2A) mRNA | 


601 152078F1 NIH_MGCJ9 Homo sapiens cDNA clone IMAGE:3508362 5' } 


601 162078F1 NIH_MGCJ9 Homo sapiens cDNA clone 1MAGE:3508362 6' | 


801297709F1 NIH_MGCJ9 Homo sapiens cDNA clone IMAGE:3Q27554 5' | 


601297709F1 NIH_MGC_19 Homo sapiens cDNA clone IMAGE:3627554 5' I 


RC1-FT01 34-280600-021 -d02 FT0134 Homo sapiens cDNA | 


Homo sapiens transmembrane protein 2 (TMEM2), mRNA j 


Human erg protein (ets-related gene) mRNA, complete cds J 


Homo sapiens RAN binding protein 7 (RANBP7), mRNA | 


Homo sapiens RAN binding protein 7 (RANBP7), mRNA I 


UI-HF-BNO-akJ-b-12-0-U!.r1 N1H_MGC_50 Homo sapiens cDNA clone IMAGE:3077326 5' j 
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hh81a09.y1 NCI_CGAP_GU1 Homo sapiens cDNA clone IMAGE:29691 76 5* similar to TR:O60327 060327 
KIAA0584 PROTEIN ; 


601 105529F1 NIH_MGC J 5 Homo sapiens cDNA clone IMAGE:2988366 5' J 


nc80b03.rt NCI_CGAP_GC1 Homo sapiens cDNA clone IMAGE:797059 5* similar to SW:FEN1_HUMAN 
P39748 FLAP ENDONUCLEASE-1 ; 


ncB0bO3.ii NC1_CGAP_GC1 Homo sapiens cDNA clone IMAGE:797069 5* similar to SW:FEN1_HUMAN 
P39748 FLAP ENDONUCLEASE-1 ; 


Homo sapiens mRNA for multidrug resistance protein 3 (ABCC3) | 


Homo sapiens mRNA for multidrug resistance protein 3 (ABCC3) j 


Homo sapiens mRNA for multidrug resistance protein 3 (ABCC3) j 


yd15c01.s1 Scares fetal liver spleen 1NFLS Homo sapiens cDNA clone IMAGE:108288 3' similar to 
gb:A21 187 ALPHA-2-MACROGLOBULIN PRECURSOR (HUMAN);coniains Alu repetitive elemenl; 


Homo sapiens hypothetical protein FLJ20080 (FLJ20080), mRNA j 


Homo sapiens rhabdoid tumor deletion region protein 1 (RTDR1 ), mRNA j 


Homo sapiens mlnichromosomo maintenance deficient (S, cerevisiae) 3 (MCM3), mRNA | 
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Top Hit Descriptor 


Homo sapiens PRKY exon 7 [ 


qp01f05x1 NCl_CGAP_Kid5 Homo sapiens cDNA clone IMAGE:1 916769 3" | 
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Pongo pygmaeus DNA, similar to pol gene of HERV-W and MSRV, isolate:ORW3-3 | 


Human mRNA for ribosomal protein, complete cds J 


Homo sapiens calcium channel gamma 4 subunit (CACNG4) gene, exon 3 | 


Homo sapiens calcium channel gamma 4 subunit (CACNG4) gene, exon 3 


Homo sapiens reelin (RELN), mRNA j 


Homo sapiens reelin (RELN), mRNA j 


Human GS2 gene, exon 6 j 


Human GS2 gene, exon 6 ! 


Human cystic fibrosis transmembrane conductance regulator (CFTR) gene, exon 4 | 


Homo sapiens T-box 4 (TBX4), mRNA ] 
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Homo sapiens transient receptor potential channel 5 (TRPC5), mRNA | 


Homo sapiens latent transforming growth factor beta binding protein 2 (LTBP2) mRNA 


Homo sapiens latent transforming growth factor beta binding protein 2 (LTBP2) mRNA. | 


DKFZp434O0127_r1 434 (synonym: htes3) Homo sapiens cDNA clone DKFZp434O0127 5' | 
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Homo sapiens chromoeom© 2 open reading frame 3 (C20RF3), mRNA | 


Homo sapiens very long chain acyl-CoA dehydrogenase gene, exons 1 -20, complete cds j 


601 4691 59F1 NIH_MGC_67 Homo sapiens cDNA clone IMAGE:3872247 5' | 


QVO-BT0263-090200-097-h03 BT0263 Homo sapiens cDNA | 


QV0-BT0263-080200-097-hO3 BT0263 Homo sapiens cDNA [ 


zx98d07.r1 Soares_NhHMPu_S1 Homo sapiens cDNA clone IMAGE:81 1 789 5" j 
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Human mRNA for KIAA0383 gene, partial cds j 


Human mRNA for KIAA0383 gene, partial cds 


Homo sapiens latent transforming growth factor beta binding protein 2 (LTBP2) mRNA j 


Homo sapiens latent transforming growth factor beta binding protein 2 (LTBP2) mRNA | 


601144863F2 NIH_MGC_19 Homo sapiens cDNA clone IMAGE:3160502 5* j 


DKFZp586K1824_r1 586 (synonym: hutel ) Homo sapiens cDNA clone DKFZp5B6K1824 } 


Homo sapiens hypothetical protein (DJ328E1 9.C1 .1 ), mRNA j 


601307146F1 NIH_MGC_39 Homo sapiens cDNA clone IMAGE:3641603 5' | 
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hJ05c06.x1 Soares_NFL_T_GBC_S1 Homo sapiens cDNA clone IMAGE:2980906 3' 


tj19e03.x1 NCI_CGAP_Gas4 Homo sapiens cDNA clone IMAGE:21 41980 3' similar to TR:031662 031662 
YKRS PROTEIN. ; 


tJ19e03JCl NCI_CGAP_Gas4 Homo sapiens cDNA clone IMAGE:2141980 3' similar to TR:031662 031 662 
YKRS PROTEIN. ; 


Zt81b04.r1 Stratagene schizo brain S11 Homo sapiens cDNA clone IMAGE:728719 5' similar to TR:G300482 
G300482 POL=REVERSE TRANSCRIPTASE HOMOLOG {RETROVIRAL ELEMENT) ; 


Zt81b04.r1 Stratagene schizo brain S11 Homo sapiens cDNA clone IMAGE:728719 5' similar to TR:G300482 
G300482 POL=REVERSE TRANSCRIPTASE HOMOLOG {RETROVIRAL ELEMENT) ; 


Homo sapiens mRNA for KIAA1093 protein, partial cd3 J 


Homo sapiens calcineurin binding protein 1 (KIAA0330), mRNA i 


Homo sapiens calcineurin binding protein 1 (KIAA0330), mRNA | 


Homo sapiens mRNA for KIAA1 1 72 protein, partial cd3 j 


601 577981 F1 NIH_MGC_9 Homo sapiens cDNA clone iMAGE:3926885 5' | 


HA0086 Human fetal liver cDNA library Homo sapiens cDNA [ 


HA0086 Human fetal liver cDNA library Homo sapiens cDNA | 


Homo sapiens ALR-like protein mRNA, partial cds j 


zk53c07.s1 SoaresjDregnant_utenjs_NbHPU Homo sapiens cDNA clone IMAGE:486540 3' similar to 
gb:X65857_cds1 OLFACTORY RECEPTOR-LIKE PROTEIN HGMP07E (HUMAN); 


Homo sapiens chromosome 21 segment HS21 C01 0 j 


Homo sapiens KIAA0744 gene product; histone deacetyiase 7 (KIAA0744), mRNA [ 


Homo sapiens KIAA0022 gene product (KIAA0022), mRNA | 


Homo sapiens Bruton's tyrosine kinase (BTK), alpha-D-ga!actosidase A (GLA), L44-Iike ribosomal protein 
(L44L) and FTP3 (FTP3) genes, complete cds 


Homo sapiens Usurpin-alpha mRNA, complete cds j 


Homo sapiens Usurpin-alpha mRNA, complete cds j 
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Homo sapiens inhibin, alpha (INHA) mRNA j 


Homo sapiens inhibin, alpha (INHA) mRNA [ 


bb74f06.y1 NIH_MGC_12 Homo sapiens cDNA clone IMAGE:3048131 5* similar to TR: 095604 095604 
ZINC FINGER PROTEIN. ; 
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zk53c07.s1 Soares _pregnant_uterus_NbHPU Homo sapiens cDNA clone IMAGE:486540 3' similar to 
gb:X65857j;ds1 OLFACTORY RECEPTOR-LIKE PROTEIN HGMP07E (HUMAN); 


Homo sapiens 2jnc finger protein ZNF287 (ZN F287), mRNA ] 


Homo sapiens zinc finger protein ZNF287 (ZNF287), mRNA ] 


601141152F1 NIH_MGC_9 Homo sapiens cDNA clone IMAGE:31 40796 5' | 


Homo sapiens KIAA0985 protein (KIAA0985), mRNA I 
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601 433472F1 NIH_MGC_72 Homo sapiens cDNA clone IMAGE:3918952 5' | 


tu67c07.x1 NCI_CGAP_Gas4 Homo sapiens cDNA clone IMAGE:22561 08 3' similar to WP:C45G9.2 
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601335826F1 NIH_MGC_44 Homo sapiens cDNA clone IMAGE:3689790 5' | 


Homo sapiens IGF-II gene, exon 5 | 


Homo sapiens IGF-II gene, exon 5 ] 
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Homo sapiens adaptor-related protein complex 2, beta 1 subunit (AP2B1 ), mRNA j 


Human chromosome 10 duplicated adrenoleukodystrophy (ALD) gene segment containing exons 8-10 


Human chromosome 10 duplicated adrenoleukodystrophy (ALD) gene segment containing exons 8-10 
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LAMBDA/IOTA PROTEIN KINASE C-INTERACTING PROTEIN. [1] ; 
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LAMBDA/IOTA PROTEIN KINASE C-INTERACTING PROTEIN. [1] ; 
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Homo sapiens myosin, heavy polypeptide 1 , skeletal muscle, adult (MYH1 ), mRNA j 
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Homo sapiens hypothetical protein FLJ20048 (FLJ20048), mRNA | 


Homo sapiens RAN binding protein 2 (RANBP2), mRNA I 


zo72c03.ii Stratagene pancreas (#937208) Homo sapiens cDNA clone IMAGE:592420 6' | 


zo72c03.M Stratagene pancreas (#937208) Homo sapiens cDNA clone 1MAGE:592420 5' J 
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co 

I 

o 
co 

i 

a 
c 
a 

< 
Z 
Q 

« 
c 

S. 
to 
« 
o 

1 

CO 

< 
o 
o 

o 
z 

1 

in 

2 
k 
$ 

X 

5 


H.saplens DNA for liver cytochrome b5 pseudogens ] 


Homo sapiens death receptor 6 (DR6), mRNA j 


Homo sapiens collagen type XI alpha-1 (COL11A1) gene, exon 63 ] 


Homo sapiens collagen typeX! a!pha-1 (COL11A1) gene, exon 63 j 
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ya52b1 2.s1 Soares fetal liver spleen 1 N FLS Homo sapiens cDNA clone I MAGE £6527 3' I 


zx66e03.rf Soaresjotal Jetus_Nb2HF8_9w Homo sapiens cDNA clone IMAGE:799444 5' similar to 
TR:G1 145880 G1145880 TITIN ; 


Homo sapiens mRNA for KIAA1 525 protein, partial cds ) 


Homo sapiens mRNA for KIAA1 525 protein, partial cds I 


Homo sapiens ciliary dynein heavy chain 9 (DNAH9) mRNA, complete cds j 


Homo sapiens ciliary dynein heavy chain 9 (DNAH9) mRNA, complete cds j 


wf08f01.x1 Soares_NFL_T_GBC_S1 Homo sapiens cDNA clone IMAGE:2350009 3' similar to 
SW:MPP2_HUMAN Q14168 MAGUK P55 SUBFAMILY MEMBER 2 ; 
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Homo sapiens mRNA for KIAA1 294 protein, partial cds ( 


Human mRNA for ankyrin (variant 2.1 ) \ 
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Homo sapiens neuro-oncologlcal ventral antigen 1 (NOVA1), splice variant 1, mRNA j 


602139138F1 NIH_MGC_46 Homo sapiens cDNA clone IMAGE:4298240 5' | 


601149404F1 NIH_MGCJ9 Homo sapiens cDNA clone 1MAGE:3502129 5' j 


601 577981 F1 NIH.MGC9 Homo sapiens cDNA clone 1MAGE:3926685 5' ] 


Homo sapiens mRNA for casein Kinase I epsilon, complete cds j 


Homo sapiens mRNA for casein kinase 1 epsilon, complete cds j 


Homo sapiens mRNA for casein kinase 1 epsilon, complete cds j 


Homo sapiens DNA for amyloid precursor protein, complete cds j 


Homo sapiens DNA for amyioid precursor protein, complete cds j 


Homo sapiens intersectin short isoform (ITSN) mRNA, complete cds j 


Homo sapiens lost on transformation LOT1 mRNA, complete cds ( 


Homo sapiens ubiquitin specific protease 8 (USP8) mRNA i 


Homo sapiens leukocyte immunoglobulin-like receptor, subfamily A (with TM domain), member 1 (LILRA1), 
mRNA 


Homo sapiens leukocyte immunoglobulin-like receptor, subfamily A (with TM domain), member 1 (LILRA1), 
mRNA 


Homo sapiens ribosomal protein L26 (RPL26) mRNA j 


Homo sapiens adilcan mRNA, complete cds ) 


Human mRNA for cytokeratin 18 j 
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Homo sapiens Intersectin short isoform (ITSN) mRNA, complete cds | 


au80e06.y1 Schneider fetal brain 00004 Homo sapiens cDNA clone IMAGE:2782594 5' similar to 
TR:Q15170 Q15170 TRANSCRIPTION FACTOR S-II-RELATED PROTEIN contains element MER22 
repetitive element ; 


Homo sapiens chromosome 21 segment HS21C047 j 


Homo sapiens neuroblastoma-amplified protein (LOC51 594), mRNA j 


Homo sapiens neuroblastoma-amplified protein (LOC51 594), mRNA j 


Homo sapiens cytochrome P450 retinoid metabolizing protein P450RAI-2 mRNA, complete cds 
Homo sapiens RAD1 (S. pombe) homolog (RAD1) mRNA, and translated products 
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Top Hit Descriptor 


Homo sapiens glutathione S-transferase theta 2 (GSTT2) and glutathione S-transferase theta 1 (GSTT1) 
genes, complete cds 


Homo sapiens zinc finger protein 76 (expressed in testis) (ZNF76), mRNA j 


ZINC FINGER PROTEIN HZF10 | 


ZINC FINGER PROTEIN HZF10 j 


ZINC FINGER PROTEIN HZF10 ) 


Homo sapiens mRNA for KIAA1 459 protein, partial cds | 


CMYA5 Human cardiac muscle expression library Homo sapiens cDNA clone 4151935 similar to CMYA5 
Cardiomyopathy associated gene 5 


CMYA5 Human cardiac muscle expression library Homo sapiens cDNA clone 4151935 similar to CMYA5 
Cardiomyopathy associated gene 5 


Homo sapiens KVLQT1 gene | 


601 513861 F1 NIHJ/IGCJ1 Homo sapiens cDNA clone 1MAGE:3915350 5 1 | 


Homo sapiens KVLQT1 gene j 


Homo sapiens similar to ribosomal protein S26 (H. sapiens) (LOC63694), mRNA J 


Homo sapiens WSCR4 gene, exons 3 and 4 | 


Homo sapiens WSCR4 gene, exons 3 and 4 | 


Homo sepiens mRNA for KIAA0634 protein, partial cd3 [ 


Homo sapiens solute carrier family 21 (organic anion transporter), member 9 (SLC21 A9), mRNA I 


Homo sapiens solute carrier family 21 (organic anion transporter), member 9 (SLC21 A9), mRNA J 


q!40d08.x1 NCI_CGAP_Bm25 Homo sapiens cDNA clone IMAGE:1858959 3' similar toTR:Q14840 Q14840 
MITOGEN INDUCIBLE GENE MIG-2 ; 


qi40d08.x1 NCI_CGAP_Brn25 Homo sapiens cDNA clone IMAGE:1 858959 3" similar to TR:Q 14840 Q14840 
MITOGEN INDUCIBLE GENE MIG-2 ; 


af72f07.r1 Soares_NhHMPu_S1 Homo sapiens cDNA clone I MAGE:1 047589 5' j 


Homo sapiens similar to ribosomal protein S26 (K sapiens) (L0C63694), mRNA | 


yq49c05.r1 Soares fetal liver spleen 1NFLS Homo sapiens cDNA clone IMAGE:1991 12 5' similar to 
SP:B48150 B48150 HP-25=HIBERNATiON-RELATED PROTEIN - TAMIAS ASIATICUS=ASIAN ; 


DKFZp762K1 71_r1 762 (synonym: hmel2) Homo sapiens cDNA clone DKFZp762Kl 71 5" 


Homo sapiens hypothetical protein (HSPC242), mRNA I 


801121995F1 NIH„MGC_20 Homo sapiens cDNA clone 1MAGE:3346366 5" | 


601121995F1 NIH_MGC_20 Homo sapiens cDNA clone IMAGE:3346366 5' 
Human gene for catalase (EC 1 .1 1 .1 .6) exon 9 mapping to chromosome 1 1 , band p13 


ol o 
1- «to 


NT 


NT I 


SWISSPROT | 


SWISSPROT | 


SWISSPROT | 


NT | 


EST HUMAN 


EST HUMAN 


NT | 


EST_HUMAN | 


NT | 


NT | 


NT | 


NT I 


NT | 


NT | 


NT 


EST HUMAN 


EST HUMAN 


EST.HUMAN | 


NT 


EST_HUMAN 


EST_HUMAN | 


NT 


ESTHUMAN 


I 

ZD 
I 

Ul Z 


Top Hit Acessfon 
No. 


AF2407B6.1 


11418522| 


Q14585 | 


Q14585 ! 


Q14585 [ 


AB040892.1 I 


AW755254.1 


AW755254.1 


AJ0O6345.1 ! 


BE888934.1 \ 


AJ006345.1 I 


114208501 


AF041056.1 I 


AF041056.1 | 


AB014534.1 | 


11437282) 


11437282| 


AM 991 17.1 


AH 991 17.1 


AA625526.1 | 


11420850| 


H83155.1 


AL120739:1 | 


7705530 


BE275192.1 


su 
P 

Ul o 


Most Similar 
(Top) Hit 
BLAST E 
Value 


1. OE-129 


ULl 

a 


o> 

CN 

LU 
o 


till 
o 


UJ 

o 


1. OE-129 1 


1. OE-129 


1. OE-129 


a 

111 

o 


1.0E-129| 


1. OE-129 1 


1.0E-129| 


1. OE-129 1 


1. OE-129 1 


1.0E-129| 


1. OE-129) 


LU 

o 


1. OE-129 


LU 
O 


1. OE-129 1 


c^ 

LU 
o 


1. OE-129 


Ul 
o 


1.0E-130 


1.0E-130 


o o 

CO CO 

UJ LU 
o o 


Expression 
Signal 


u> 


s 

c\i 


1.33| 


1.33 1 


1.33 1 


1.87| 


1.86 


1.86 


4.28 1 


0.54 1 


4.07| 


CO 

CD 


0.78) 


0.78| 


4.37| 


0.79| 


0.701 


0.48 


5 

d 


2.69 1 


CN 

co 


4.21 


2.63| 


0.65 1 


13.33| 


13.33 
3.15 


a .. 


s 


£ 


co 




to 


CN 


<D 




CO 


o 


a 


© 


to 
cn 


CD 
CO 




s 




1 


s 


§ 


CD 










CO 
CO 


ORF SE 
ID NO 




B 


S 


8 




8 


csl 

8 




to 


o 

CO 


CD 
CO 




i 


o 

3 




co 
% 


CO 

3 


CO 


CO 




1 






§ i 


: £ 

5 CM 


IS 


i <= e « 

CO 


14766 


14888| 


16202) 


16202| 


16202) 


17223) 


17338 


17338 


19284| 


19731 | 


20305| 


20364| 


20724) 


20724 | 


21 629 | 


23361 | 


23361| 


23798 


23798 


24497 | 


20364| 


25164 


25401 | 


131941 


14707[ 


14707 
15021 


Probe 
SEQ ID 

NO: 


1736 


| 1863) 


LO 
CO 


| 3145| 


3145 


! 4192| 


4309 


4309 


| 6210) 


| 6674| 


[ 7334| 


CD 

E 


7771 


7771 


1 


| 10439| 


| 10439] 


10878 


10878 


| 11557) 


| 11630| 


CO 
CO 

8 


| 12758! 


S- " 


i 1675 


1675 
2000 1 



476/546 



WO 0.1/57276 



PCT/US01/00668 



S i 



IS. O 

r» -Q 
^ TO 



15 

< 2 
Z uJ 

2d 
8-0 

w - 

9 X 



»,IL 

Jo 



2,0 



o 

SB 
8d 

R 

i "J 



i s S £ t » 



O 

ill 1 
,,-000 

i x 0 -*= ■»= 



LL. rj 

LU _ 
X £ 



8 o 



S3* 



1 W 



1 a 9 

} LU Z 



2 a o 



477/546 



WO 01/57276 



PCT7US0 1/00668 

















CO 


















Homo sapiens amilorido binding protein 1 (amine oxidase (copper-containing)) (ABP1), nuclear gene 
encoding mitochondrial protein, mRNA 
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Top Hit Descriptor 


,Homo sapiens beta-tubulin mRNA, complete cds 


iHuman heparin cofactor II (HCF2) gene, exons 1 through 5 


i Homo sapiens RNA-blnding protein S1, serine-rich domain (RNPS1), mRMA 


'Homo sapiens mRNA for multidrug resistance protein 3 (ABCC3) 


Homo sapiens mRNA for multidrug resistance protein 3 (ABCC3) 


]HUM516H08B Human placenta polyA+ (TFujiwara) Homo sapiens cDNA done GEN-516H0J 


|HUM516H08B Human placenta polyA+ (TFujiwara) Homo sapiens cDNA clone GEN-516H0! 


Human ribosomal protein L7 (RPL7) mRNA, complete cds 


|cr48e07.x1 Jia bone marrow stroma Homo sapiens cDNA clone HBMSC_cr48e07 3' 


|cr48e07.x1 Jia bone marrow stroma Homo sapiens cDNA clone HBMSC_cr48e07 3' 


| Human von Willebrand factor pseudogene corresponding to exons 23 through 34 


| Homo sapiens protein tyrosine phosphatase, non-receptor type substrate 1 (PTPNS1) mRN£ 


| Homo sapiens protein tyrosine phosphatase, non-receptor type substrate 1 (PTPNS1) mPJNA 


|Homo sapiens protein tyrosine phosphatase, non-receptor type substrate 1 (PTPNS1) mRN£ 


| Homo sapiens protein tyrosine phosphatase, non-receptor type substrate 1 (PTPNS1 ) mRN/i 


| Homo sapiens heterogeneous nuclear ribonucleoproteln A1 (HNRPA1) mRNA 


|Homo sapiens actin, beta (ACTB) mRNA 


|Human polyhomeotic 1 homdog (HPH1 ) mRNA, partial cds 


|HA1347 Human fetal liver cDNA library Homo sapiens cDNA 


|Homo sapiens mRNA for KIAA1363 protein, partial cds 


ts38b05.x1 NCLCOAP_Ut4 Homo sapiens cDNA clone IMAQE:2230833 3* similar to TR:Q* 
MITOCHONDRIAL TRANSCRIPTION TERMINATION FACTOR PRECURSOR. ; 


ts38b05.x1 NCI_CGAP_Ut4 Homo sapiens cDNA clone IMAGE:2230833 3' similar to TR:QS 
MITOCHONDRIAL TRANSCRIPTION TERMINATION FACTOR PRECURSOR. ; 


|yy01h09.r1 Soares melanocyte 2NbHM Homo sapiens cDNA clone IMAGE:27001 7 5* 


|yy01 h09.r1 Soares melanocyte 2NbHM Homo sapiens cDNA clone IMAGE:270017 5* 


|Homo sapiene neuropilin 2 (NRP2) mRNA 


|Homo sapiens polymerase (RNA) II (DNA directed) pdypeptlde A (220kD) (POLR2A) mRNA 


[Homo sapiens polymerase (RNA) II (DNA directed) polypeptide A (220kD) (POLR2A) mRNA 


[Homo sapiens IgG Fc binding prolein (FC(GAMMA)BP) mRNA 


|ya83g04.r2 Stratagene fetal spleen (#937205) Homo sapiens cDNA clone (MAGE:68310 5* 
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Top Hit Descriptor 


Homo sapiens heterogeneous nuclear ribonucleoprotein AI (HNRPA1) mRNA | 


j601460375Fl NIH_MGC_66 Homo sapiens cDNA clone IMAGE:3663803 5' j 


j Homo sapiens heterogeneous nuclear ribonucleoprotein A1 (HNRPA1 ) mRNA j 


Homo sapiens serine palmitoyl transferase, subunit II gene, complete cds; and unknown genes | 
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gb:X1 6282_cds1 ZINC F~NGER PROTEIN CLONE 647 (HUMAN); ! 
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| Homo sapiens mRNA for K1AA0784 protein, partial ods j 


| Homo sapiens mRNA for KIAA0784 protein, partial cds | 


|Homo sapiens mRNA for KIAA0784 protein, partial cds J 


| Homo sapiens mRNA for KIAA0784 protein, partial cds | 


| Human gamma-cytoplasmic actin (ACTGP9) pseudogene j 


| Homo sapiens CTCL tumor antigen eel 4-3 mRNA, complete cds j 


| Homo sapiens CTCL tumor antigen se1 4-3 mRNA, complete cds j 


| Homo sapiens chromosome X MSL3-2 protein mRNA, complete cds j 


| Homo sapiens chromosome X MSL3-2 protein mRNA, complete cds | 
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tq04f08.xi NC!_CGAP_Ut3 Homo sapiens cDNA clone IMAGE:2207847 3' similar to gb:J03191 PROFILIN I 
(HUMAN); 


| Homo sapiens DNA mismatch repair protein ( ML H 3 ) gene, complete cds | 


| Homo sapiens ribosomal protein L31 (RPL31 ) mRNA | 


| Homo sapiens TADA1 protein mRNA, oomplete cds j 


| Homo sapiens mRNA for KIAA0721 protein, partial cds ] 
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801822627F1 NIH_MGC_75 Homo sapiens cDNA clone IMAGE:4045447 5' j 


Homo sapiens hypothetical protein FLJ20701 (FU20701), mRNA I 


iHomo sapiens hypothetical protein FLJ20701 (FLJ20701 ), mRNA \ 


Homo sapiens Smad- and Olf-interacting zinc finger protein mRNA, partial cds j 


Homo sapiens Smad- and Olf-interacting zinc finger protein mRNA, partial cds | 


Homo sapiens NOD1 protein (NOD1) gene, exons 1 , 2, and 3 I 


Homo sapiens mRNA for KIAA1386 protein, partial cds j 


IHomo sapiens low density lipoprotein-related protein 2 (LRP2), mRNA 1 


Homo sapiens low density lipoprotein-related protein 2 (LRP2), mRNA j 


] Homo sapiens low density lipoprotein-related protein 2 (LRP2), mRNA J 


Homo sapiens low density lipoprotein-related protein 2 (LRP2), mRNA j 
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Homo sapiens RGH2 gene, retrovirus-like element I 
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gb:A21187 ALPHA-2-MACROGLOBULIN PRECURSOR (HUMAN); 


|Homo sapiens novel SH2-containing protein 3 (NSP3) mRNA j 
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Human TFEB protein mRNA, partial cds 
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xl69b01.x1 NCI_CGAP_Pan1 Homo sapiens cDNA clone IMAGE:2679913 3' 


Homo sapiens calcineurin binding protein 1 (KIAA0330), mRNA 


Homo sapiens mRNA for KIAA0577 protein, complete cds 


H.sapiens genes for eemenogelin 1 and semenogelin II 


H.sapiens genes for semenogelin 1 and semenogelin II 


Homo sapiens mRNA for KIAA1 51 3 protein, partial cds 


Homo sapiens SMCY (SMCY) gene, complete cds 


Homo sapiens TP53TG3a (TP53TG3a), mRNA 


Homo sapiens coagulation factor IX (plasma thromboplastlc component, Christmas disease, hemoph 
(F9) mRNA 


AU140831 PLACE4 Homo sapiens cDNA clone PLACE4000321 5" 


7B22E10 Chromosome 7 Fetal Brain cDNA Library Homo sapiens cONA clone 7B22E10 
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Top Hit Descriptor 


iHomo sapiens ATP-sensitive inwardly rectifying K-channd subunit (KCNJ67BIR1) gene, complete cds 


Homo sapiens methyl CpG binding protein 2 (MECP2), mRNA | 


Homo sapiens KIAA0569 gene product (KIAA0569), mRNA ] 


Homo sapiens 5-hydraxytryptamine (serotonin) receptor 1D (HTR1D) mRNA J 


Homo sapiens gene for TMEM1 and PWP2,comp!ete and partial cds | 


Homo sapiens gene for TMEM1 and PWP2,complete and partial cds j 


Homo sapiens transient receptor potential channel 5 (TRPC5), mRNA J 


Homo sapiens chromosome X open reading frame 5 (CXORF5) mRNA | 


Homo sapiens chromosome X open reading frame 5 (CXORF5) mRNA j 


Human zinc finger protein ZNF134 mRNA, complete cds | 


Homo sapiens intersectin short Isoform (ITSN) mRNA. complete cds ] 


Homo saplene potassium voltage-gated channel, Shab-related subfamily, member 1 (KCNB1 ) mRNA 


Homo sapiens familial mental retardation protein 2 (FMR2) gene, exon 1 1 


Homo sapiens SCaS-interacting protein 1 (SRRP1 29), mRNA ( 


Homo sapiens amphlphysin gene, partial cds 


wk01f01 XI NCI_CGAP_Lym1 2 Homo sapiens cDNA clone IMAGE:241 1065 3' similar to TR:O43340 
O43340 R28830_2. contains element PTR7 repetitive element ; 


Homo sapiens ribosomal protein S8 (RPS8), mRNA | 


DKFZp434N0413_r1 434 (synonym: htes3) Homo sapiens cDNA clone DKFZp434N0413 5" j 


Homo sapiens AP1 gamma subunit binding protein 1 (AP1GBP1 ), mRNA j 


Homo sapiens AP1 gamma subunit binding protein 1 (AP1 GBP1 ), mRNA J 


Homo sapiens glutamate receptor, metabotropic 3 (GRM3) mRNA j 


Homo sapiens melanoma antigen, family B, 1 (MAGEB 1 ) mRNA j 


Homo sapiens HBP1 7 heparin-bindlng and FGF-binding protein gene, complete cds I 


Homo sapiens ryanodine receptor 3 (RYR3) mRNA I 


Homo sapiens zinc finger protein (KIAA041 2) mRNA j 


RC3-HT0860-1 70800-01 1-a1 2 HT0860 Homo sapiens cDNA I 


MXRA5 Human matrix tissue expression library Homo sapiens cDNA clone Incyte 1996726 similar to MXRA5 
Matrix remodeling associated gene 5 


MXRA5 Human matrix tissue expression library Homo sapiens cDNA clone Incyte 1996726 similar to MXRA5 
Matrix remodeling aseociated gene 5 


Homo sapiens F-box protein Fbl3b (FBL3B) mRNA, partial cds j 


Top Hit 
Database 
Source 
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NT | 
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NT 
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NT 
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co 

S 

CM 

$ 


io' 
£ 

CM 

s 

< 
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AF099117.1 | 


AI864727.1 


4506742! 
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5 
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4585642 1 


BF355295.1 ) 


AW 888221.1 
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AF1 29533.1 | 
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(Top) Hit 
BLAST E 
Value 
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Q.OE+OOl 
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0.0E+00) 
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0.0E+00 1 
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0.0E+00 j 
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O.OE+00, 


O.OE+00 1 
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Exon 
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NO: 
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Top Hit Descriptor 


Homo sapiens GA-binding protein trenscription factor, alpha subunit (60kD) (GABPA), mRNA 1 


Homo sapiens semenogelin II (SEMG2).mRNA | 


Homo sapiens hypothetical protein FLJ10379 (FU10379), mRNA j 


Homo sapiens hypothetical protein FLJ 10379 (FLJ 10379), mRNA \ 


Hgmo sapiens mRNA for KIAA0895 protein, partial cds j 
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wu04d04.x1 NCI_CGAP_GC6 Homo sapiens cDNA clone IMAGE:2615975 3' | 


MR1-HT0707-1 00500-001 -a02 HT0707 Homo sapiens cDNA j 


MR1-HT0707-1 00500-001 -fi02 HT0707 Homo sapiens cDN A j 


601 120778F1 NIH_MGC_20 Homo sapiens cDNA done IMAGE:29676S0 5' I 


Homo sapiens mRNA for KIAA1125 protein, partial cds j 


Homo sapiens mRNA for Kl AA1 1 25 protein, partial cds | 


Homo sapiens transglutaminase 3 (E polypeptide, protein-glutamine-gamma-glutamyitransferase) (TGM3) 
mRNA 


Homo sapiens nuclear receptor coactlvator 3 (NCOA3), mRNA j 


ba51f04.x1 NIH_MGC_10 Homo sapiens cDNA clone IMAGE:2900095 3' similar to SW:THI2_BOVIN 
Q95108 MITOCHONDRIAL THIOREDOXIN PRECURSOR ; 


UI-HF-BM0-adx-c-02.Q-UI.r1 NIH_MGC_38 Homo sapiens cDNA clone IMAGE:3063147 5' | 


Homo sapiens hypothetical protein FLJ10498 (FLJ10498), mRNA | 


Homo sapiens hypotheUcal protein FLJ10498 (FLJ10498), mRNA J 


Homo sapiens polycystic kidney disease (polycystin) and REJ (sperm receptor for egg jelly, sea urchin 
homolog)-Pke (PKDREJ) mRNA 


zu68h07.s1 Soares_testls_NHT Homo sapiens cDNA clone IMAGE:743197 3' similar to contains Aiu 
repetitive element;contalns element MER35 repetitive element ; 


zu68h07.s1 Soares_testis_NHT Homo sapiens cDNA clone IMAGE:743197 3* similar to contains Aiu 
repetitive eJement;contalns element MER35 repetitive element ; 


Homo sapiens tittn (TTN) mRNA ] 


Homo sapiens titin (TTN) mRNA j 


Homo sapiens chromosome 21 segment HS21 C1 03 J 


Homo sapiens mRNA for olfactory receptor protein, pseudogene I 


Human apolipoprotein B-100 mRNA, complete cds J 


PM2-DT0023-080300-004-a08 DT0023 Homo sapiens cDNA [ 


Homo sapiens myelodysplasia syndrome 1 (MDS1 ) mRNA j 
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Top Hit Descriptor 


zp18g08.s1 Stratagene fetal retina 937202 Homo sapiens cDNA clone IMAGE:609854 3" j 


Homo sapiens odr (odd Oz/ten-m, Drosophfla) homclog 1 (ODZ1 ), mRNA I 


Homo sapiens chromosome 21 segment HS21 C084 \ 


yt92b01 .s1 Scares _ptnea!jglsnd_N3HPG Homo sapiens cDNA clona IMAGE:231721 3' j 


yt92b01 .s1 Soaresj3ineal_gland_N3HPG Homo sapiens cDNA clone IMAGE:231721 3' | 


Homo sapiens cyclophllln-related protein (NKTR) gene, complete cds | 


Homo sapiens chromosome 21 segment HS21C100 | 


Homo sapiens gene for natriuretio protein, partial cds j 


Homo sapiens DNA mismatch repair protein (MLH3) gene, complete cds ] 


Novel human gens mapping to chomosome 1 j 


Homo sapiens keratin 1 8 (KRT1 8) mRNA \ 


Homo sapiens keratin 1 8 (KRT1 8) mRNA | 


Mus musculus E-cadherln binding protein E7 mRNA, complete cds j 


Homo sapiens ADP/ATP carrier protein (ANT-2) gene, complete cds | 


Homo sapiens ADP/ATP carrier protein (ANT-2) gene, complete cds j 


Homo sapiens ADP/ATP carrier protein (ANT-2) gene, complete cd6 j 


Homo sapiens mRNA for KIAA1047 protein, partial cds [ 


Human endogenous retrovirus type K (HERV-K), gag. pol and env genes ] 


Homo sapiens truncated tenascin XB (TNXB) gene, partial cds and TNXA gene recombination breakpoint 
region 


Homo sapiens mRNA for KIAA1399 protein, partial cds ( 


Homo sapiens mRNA for KIAA1 399 protein, partial cds j 


Human displacement protein (CCAAT) mRNA j 


Homo sapiens butyrophilln, subfamily 2, member A2 (BTN2A2), mRNA ] 


Homo sapiens butyrophilin, subfamily 2, member A2 (BTN2A2), mRNA | 
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ya83g04.r2 Stratagene fetal spleen (#937205) Homo sapiens cDN A clone IMAGE:6831 0 5" ] 


601158935F1 NIH_MGC_21 Homo sapiens cDNA clone IMAGE 350 5521 5' | 


Homo sapiens ecotropic viral integration site 2B (EVI2B), mRNA j 


Homo sapiens ecotropic viral Integration site 2B (EVI2B), mRNA j 


Human AHNAK nudeoprotein mRNA, 5' end | 


Human haptoglobin and haptoglobin-related protein (HP and HPR) genes, complete cds j 


Human haptoglobin and haptoglobin-related protein (HP and HPR) genes, complete cds j 
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Value 
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ORF SEQ 
ID NO: 
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30568 | 
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30592] 
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26174) 


261 75 | 
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30857] 
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30661 [ 
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s 
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CO 
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17703 | 


17703 1 


177031 


17704| 


17709| 


17720 


17725 1 


17725] 


17726] 


17730| 


177301 


132441 


132441 


17733 


17757| 


17757) 


177631 


17766' 


17766 


Probe 
SEQ ID 

NO: 
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! 4664 | 
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s 


! 4872I 


I 4673] 


4682| 
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S I 

<o C 


I 4683] 


! 4688 | 




! 4704] 


] 4704 | 


\ 4705| 


! 47091 


! 4709| 


1 4711| 


| 4711 
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Top Hit Descriptor 


Homo sapiens cyclophilln-related protein (NKTR) gene, complete cds | 


|Homo sapiens KIAA1084 protein (KIAA1084), mRNA | 


| Homo sapiens KIAA0563 gene product (KIAA0563), mRNA j 


SCN1A=brain type I sodium channel aJpha-6ubunit{IIIS5 transmembrane region} [human, placenta, Genomic, 
1556nt] 


SCN1A=braIn type I sodium channel alpha-subunit {IIIS5 transmembrane region} [human, placenta, Genomic, 
1556 nt] 


| Novel human mRNA from chromosome 1 , which has similarities to BAT2 genes | 


j Human CYP2D7AP pseudogene for cytochrome P450 2D6 [ 


| Homo sapiens bromodomain adjacent to zinc finger domain, 2B (BAZ2B), mRNA | 
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| Homo sapiens G-protein coupled receptor (RE2), mRNA 


| Homo sapiens G-protein coupled receptor (RE2), mRNA [ 


| Homo sapiens prot«'nx0008 (AD01 3), mRNA | 


[ Homo sapiens protejnx0008 (AD0 1 3), mRNA [ 
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| Homo sapiens aldehyde dehydrogenase 1 2 (ALDH 1 2) mRNA, complete cds I 


| Homo sapiens HSPC024-iso mRNA, complete cds j 


| Human MHC class I transplantation antigen (hla) gene | 


| Human MHC class 1 transplantation antigen (hla) gene | 


Homo sapiens glutathione S-transferase theta 2 (GSTT2) and glutathione S-transferase theta 1 (GSTT1) 
genes, complete cds 


|M.fascicularis mRNA for metalloprotease-liks, disintegrin-liko protein, IVa j 


|Homo sapiens WilUams-Beuren syndrome deletion transcript 9 (WBSCR9) mRNA, complete cds j 


| Mus musculus zinc finger transcription factor Kaiso mRNA, complete cds ! 


j Homo sapiens fragile X mental retardation 2 (FMR2) mRNA j 


| Homo sapiens actin, alpha, cardiac muscle (ACTC), mRNA | 


|ZINC FINGER PROTEIN 132 j 


| Homo sapiens hypothetical protein DKFZp762E1312 (DKF2p762El312), mRNA | 


|Homo sapiens hypothetical protein FLJ20O73 (FLJ20073), mRNA ] 


Human Tcr-C-delta gene, axons 1-4; Tcr-V-delta gene, exons 1-2; T-cell receptor alpha (Tcr-alpha) gene, J1 - 
J61 segments; and Tcr-C-alpha gene, exons 1-4 
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Top Hit Descriptor 


Homo sapiens mRNA for KIAA1 043 protein, partial cds I 


Homo sapiens hypothetical protein FLJ20477 (FU20477), mRNA ] 


Homo sapiens hypothetical protein FLJ2Q477 (FLJ20477). mRNA | 


no14g09.s1 NCI_CGAP_Phe1 Homo sapiens cDNA clone IMAGE:1 100704 3' similar to TR:E239140 
E239140 SPALT PROTEIN ; 


no14g09.s1 NCI_CGAP_Phe1 Homo sapiens cDNA clone IMAGE:1 100704 3* similar to TR:E2391 40 
E239140 SPALT PROTEIN ; 


no14g09.s1 NCI_CGAP_Phe1 Homo sapiens cDNA clone IMAGE:1 100704 3" similar toTR:E239140 
E239140 SPALT PROTEIN ; 


Homo sapiens E2F transcription factor 2 (E2F2) mRNA j 
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Homo sapiens chromosome 21 segment HS21 C009 


Homo sapiens gammma-cytopiasmic actin (ACTGP3) pseudogene [ 


Bacillus amyioliquefaciens sacB gene for levaneucrase (EC 2.4.1.1 0) j 


Homo sapiens vascular endothelial cadherin 2 mRNA, complete cds j 


Homo sapiens vascular endothelial cadherin 2 mRNA, complete cds | 
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Homo sapiens putative GPR37 gene, exon 2 | 


Homo sapiens putative GPR37 gene, exon 2 | 
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Homo sapiens 4F2 light chain (LOC51597), mRNA I 


Homo sapiens 4F2 light chain (LOC51 597), mRNA | 


Human versicsn V2 core protein precursor splice-vaiiant mRNA, complete cds j 


Homo sapiens serine-threonine protein kinase (MNBH) mRNA, complete cds j 


Homo sapiens serine-threonine protein kinase (MNBH) mRNA, complete cds \ 


Homo sapiens jumonji (mouse) homolog (JMJ) mRNA I 


Human oligodendrocyte myelin glycoprotein (OMG) exons 1-2; neurofibromatosis 1 (NF1) exons 28-49; 
ecotropic viral integration site 2B (EVI2B) exons 1-2; ecotropic viral integration site 2A (EVI2A) exons 1-2; 
adenylate kinase (AK3) exons 1-2 


Homo sapiens mRNA for neurexin l-alpha protein, complete cds ] 


DKFZp434l0713_r1 434 (synonym: htes3) Homo sapiens cDNA clone DKFZp434l0713 5' | 


Homo sapiens aconitase (AC02) gene, nuclear gene encoding mitochondrial protein, exon 1 5 j 


Homo sapiens keratin 1 2 (KRT1 2) gene, complete cds j 
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Top Hit Descriptor 


MR0-BT0264-221 1 99-002-f 1 1 BT0264 Homo sapiens cDNA i 


Homo sapiens Achaete-Scute homologue 2 (ASCL2) gene, complete cds J 


AU1 19245 HEM8A1 Homo sapiens cDNA clone HEMBA1005360 5' j 


AU119245 HEMBA1 Homo sapiens cDNA clone HEMBA1005360 5' | 
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|601105344F1 NIH_MOCJ5 Homo sapiens cDNA clone IMAGE:2987963 5' j 


601 105344F1 NIH_MGC_1 5 Homo sapiens cDNA clone IMAGE:2987SS3 5' I 


j 6014431 75F1 NIH_MGC_65 Homo sapiens cDNA clone 1MAGE:3847291 6' | 
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|AV719444 GLC Homo sapiens cDNA clone GLCEHC06 5' | 


|601681150F1 NlH_MGC_9 Homo sapiens cDNA clone IMAGE:3951301 5' | 
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Homo sapiens low voltage-activated T-type calcium channel alpha 1 G splice variant CavT.la (CACNA1G) 
mRNA, complete cds 
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au96h08.y1 Schneider fetal brain 00004 Homo sapiens cDNA clone IMAGE:2784159 5' similar to 
TR:O15390 015390 GT24. [3] TR:O43840 TR:O43206 ; 


au66h08.y1 Schneider fetal brain 00004 Homo sapiens cDNA clone IMAGE:2784159 5' similar to 
TR:Ol5390 015390 GT24. [3] TRO43840 TR:O43206 ; 
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|601 589371 F1 NIH_MGC_7 Homo sapiens cDNA clone IMAGE:3943504 5' J 


[601587561 Fl NIH_MGC_7 Homo sapiens cDNA clone IMAGE:3941847 5' | 


I QV1-GN0D65-1 40800-31 8-h02 GN0065 Homo sapiens cDNA I 


|e01512058F1 NIH_MGC_71 Homo sapiens cDNA clone IMAGE:391331 1 5' 


60151205BF1 NIH_MGC_71 Homo sapiens cDNA clone !MAGE:391331 1 5' 
Human antigen CD27 gene, exons 1-2 
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Top Hit Descriptor 


601339977F1 NIH_MGC_53 Homo sapiens cDNA clone IMAGE:3682267 5' j 


601443667F1 NlH_MGC_65 Homo sapiens cDNA clone IMAGE:3847697 5' [ 


601443667F1 NIH_MGC_65 Homo sapiens cDNA clone IMAGE:3847697 5' | 
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7b49f03.x1 NCI_CGAP_Lu24 Homo sapiens cDNA clone IMAGE3231581 3' similar to SW:GG95_HUMAN 
Q08379 GOLGIN-95. ; 


CM1-HT0877-060900-397-g1 1 HT0877 Homo sapiens cDNA | 


lb 

8 
§ 

i 
I 

1 

a 

w 

S 

a. 

CO 

w 

X 
W 

i 

Z 

g 
CO 

n 

CO 

5 


Homo sapiens catenin (cadherin-associated protein), delta 2 (neural plakcphilin-related arm-repeat protein) 
(CTNND2), mRNA 


Homo sapiens sodium channel, nonvoltage-gated 1 , beta (Liddle syndrome) (SCNN1B), mRNA j 


601 1 50662F1 NIH_MGC_19 Homo sapiens cDNA clone IMAGE:3503391 5' j 


601 150662F1 NIH_MGC_19 Homo sapiens cDNA clone IMAGE:3503391 5" | 
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Homo sapiens Bloom syndrome (BLM) mRNA | 


Human MYCL2 gene, complete cds j 


Homo sapiens cadherln 20 (CDH20) mRNA, complete cds | 


Homo sapiens cadherln 20 (CDH20) mRNA, complete cds ] 


Human neurofibromatosis type 1 gene, exon x6 \ 


Homo sapiens melanoma antigen, family B, 2 (MAGEB2), mRNA j 


tg53o06.x1 Soares_NFL_T_GBC_Sl Homo sapiens cDNA clone IMAGE:21 12490 3' similar to 
SW:OXYB_HUMAN P22059 OXYSTEROL-BINDING PROTEIN. ; 


tg53c06.x1 Soares NFL_T_GBC_S1 Homo sapiens cDNA clone IMAGE:21 12490 3' similar to 
SW:OXYB_HUMAN P22059 OXYSTEROL-BINDING PROTEIN. ; 


601115515F1 NIH_MGC_16 Homo sapiens cDNA clone IMAGE:3356330 5* | 


AU1 18478 HEMBA1 Homo sapiens cDNA clone HEMBA1 003679 5' j 


601 148954F1 NIHMGCJ19 Homo sapiens cDNA clone iMAGE:3501829 5' J 
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H.sapiens mRNA for latent transforming growth factor-beta binding protein (LTBP-2) I 


Homo sapiens ciliary dynein heavy chain 9 (DNAH9) mRNA, complete cds ! 


Homo sapiens ciliary dynein heavy chain 9 (DNAH9) mRNA, complete cds j 


Homo sapiens NALP1 mRNA, complete cds j 
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601672310F1 NIH_MGC_20 Homo sapiens cDNA clone IMAGE:3955131 5' | 


ze33h08.M Soares retina N2b4HR Homo sapiens cDNA clone IMAGE:360831 5' j 
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Human amyloid-beta protein (APP) gene, exon 11 J 


Human amyloid-beta protein (APP) gene, exon 11 j 


bb34d02y1 NIH_MGCJ0 Homo sapiens cDNA clone IMAGE:2985123 5' similar to TR:064652 064652 
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zt81b04.r1 Stratagene schizo brain S1 1 Homo sapiens cDNA clone IMAGE:72871 9 5' similar to TR:G3 00482 
G300482 POL=REVERSE TRANSCRIPTASE HOMOLOG {RETROVIRAL ELEMENT} ; 
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| Homo sapiens mRNA for KIAA0884 protein, partial cds | 


|AU1 42402 Y79AA1 Homo sapiens cDNA clone Y79AA1 000277 5' } 
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|601285550F1 NIH_MGC_44 Homo sapiens cDNA clone !MAGE:3607237 5* \ 


Homo sapiens killer cell immunoglobulin-Dke receptor, two domains, short cytoplasmic tail, 1 (KIR2DS1), 
mRNA 
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|602153008F1 NIH_MGC_81 Homo sapiens cDNA clone IMAGE:4294128 5' | 


|AU134114 OVARC1 Homo sapiens cDNA clone OVARC1001296 5* j 


|602069632F1 NCI_CGAP_Brn64 Homo sapiens cDNA clone 1MAGE:421 2727 5' j 


|602069632F1 NCI_CGAP_Brn64 Homo sapiens cDNA clone IMAGE:4212727 5' J 


|DKF2p761 P092_r1 761 (synonym: hamy2) Homo sapiens cDNA clone DKFZp781 P092 5' | 
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au93b08jd Schneider fetal brain 00004 Homo sapiens cDNA clone IMAGE:2783799 3' similar to 
TR:O60463 O60463 TYPE-2 PHOSPHATIDIC ACID PHOSPHOHYDROLASE. [1] ; 


xa07d12.x1 Soares_NFL_T_GBC_S1 Homo sapiens cDNA ctane IMAGE:2567639 3' similar to contains 
element OFR repetitive element ; 


| Homo sapiens centrosomal protein 2 (CEP2), mRNA f 


|za36d05.M Soares feta! liver spleen 1NFLS Homo sapiens cDNA clone IMAGE:294633 5' | 
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Homo sapiens Xq pseudoautosomal region; segment 1/2 j 


Human DNA for ceruloplasmin, exon 5 | 


qv95c12x1 NCLCGAPJJt2 Homo sapiens cDNA clone 1MAGE:1989334 3' similar to TR:Q14673 Q14673 
KIAAQ164 PROTEIN. ; 


7d76a04.x1 NCI_CGAPJ_u24 Homo sapiens cDNA clone 1MAGE:3278862 3' similar to TR:095793 095793 
STAUFEN PROTEIN. ; 


wl60b10j(1 NCI_CGAP_Brn25 Homo sapiens cDNA clone 1MAGE:2429275 3' similar to 
SW:COGT_HUMAN P50281 MATRIX METALLOPROTE1NASE-1 4 PRECURSOR ; 
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601334790F1 NIH_MGC_39 Homo sapiens cDNA clone IMAGE:3688655 5' | 


Homo sapiens Chediak-Higashi syndrome 1 (CHS 1 ), mRNA | 


Homo sapiens Chediak-Higashi syndrome 1 (CHS1), mRNA | 
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2t73a08.s1 Soares_testis_NHT Homo sapiens cDNA ctone IMAGE727958 3' similar to gb:S85B55 . 
PROHIBITIN (HUMAN); 
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QV3-DT0045-221 299-046-c07 DT0045 Homo sapiens cDNA j 


Q V3-DT0045-221 299-046-c07 DT0045 Homo sapiens cDN A j 


60145241 2F1 NIH„MGC_66 Homo sapiens cDNA done IMAGE:38561 79 5' j 


60145241 2F1 NIH_MGC66 Homo sapiens cDNA done 1MAGE:3856179 5' | 
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Homo sapiens chromosome 21 segment HS21 C009 J 


wm33a11 .x1 NCl_CGAP_Ut4 Homo sapiens cDNA clone 1MAGE:2437724 3' similar toTR:075457 075457 
CYTOSOLIC PHOSPHOLIPASE A2-GAMMA. ; 


ne26d10.s1 NCI_CGAP_C©3 Homo sapiens cDNA clone IMAGE:882259 3' similar to TR:G1 136434 
G1 136434 K1AA0187 PROTEIN. ; 


Homo sapiens protocadherln beta 3 (PCDHB3), mRNA j 
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601431 238F1 NIH_MGC_72 Homo sapiens cDNA done IMAGE:3916569 5' j 


Top Hit 
Database 
Source 


EST_HUMAN | 


z 


NT | 


EST HUMAN 


EST_HUMAN 


ESTHUMAN 


ESTJ-1UMAN | 


EST_HUMAN | 


z 


NT I 


EST HUMAN 


EST_HUMAN 


EST_HUMAN 


ESTJHUMAN | 


ESTHUMAN | 


ESTHUMAN | 


ESTJHUMAN | 


EST.HUMAN | 


h- 
z 


NT 


EST HUMAN 


EST_HUMAN 


1N| 


z 

i 

UJ 


EST_HUMAN | 


Top HitAcessIon 
No. 


BE745597.1 S 


in 

s 


D45032.1 ! 


A1367350.1 


in 

s 

s 

CO 


1 

CO 
CO 

< 


BE50365O.1 | 


BE563650.1 \ 


11427235| 


11427235) 


AA403192.1 


AA403192.1 


AA398511.1 


BE837593.1 j 


AW364874.1 | 


AW364874.1 | 


BE612586.1 | 


BE612586.1 I 


AL1 63209.2 | 


AL163209.2 


AI884477.1 


AA502294.1 


11416799| 


d 

I 

co 
m 

< 


BE890797.1 ! 


Most Similar 
(Top) Hit 
BLAST E 
Value 


O.OE+OOI 


0.0E+00| 


1 

LU 
o 
d 


00+30'0 


O.OE+00 


O.OE+00 


0.0E+00 1 


0.0E+00| 


O.OE+00 1 


a.0E+00| 


O.OE+00 


0.0E+00 


O.OE+00 


O.OE+00] 


O.OE+OOl 


O.OE+OOl 


O.OE+OOl 


O.OE+OOl 


o 

? 
Ui 
o 
d 


O.OE+OOl 


0.0E+00 


0.0E+00 


O.OE+OOl 


0.0E+00I 


O.OE+OOl 


(Expression 
Signal 


1.26| 


cm 


0.44| 


1.08 


CD 

CM" 


1.22 


1.29| 


1.29| 


1.931 


1.93| 


1.35 


1.35 


3.69 


0.53 1 


1.25| 


1.25| 


1.261 


1.26| 


in 

CD 


1.65] 


r» 
d 


in 

CO 

d 


0.57| 


0.99| 


1.97| 


ORFSEQ 
ID NO: 


34932 1 


34946 | 


34965| 


34983 


34994 


34966 


35012| 


35013) 


35023| 


35024) 


35026 


35027 




35076) 


35077| 


36078 1 


35097| 


35098 | 


35116| 


35116| 


35123 


35129 




35140| 




Exon 
SEQ ID 
NO: 


21514 


21527[ 


21548 


21567 


2157b| 


21580 


21 593 | 


21 593 | 


21601] 


21601) 


21603 


21603 


21644 


21653| 


21654| 


s 

CO 
CM 


21673| 


216731 


21688| 


21688| 


21698 


21705 


217101 


21717 


21720j 


Probe 
SEQ ID 

NO: 


[ 8546 


[ 8559 


| 8578 


8599 


01-98 


8612 


s 
s 


| 8625 | 


j 8633 | 


8 
8 


10 

8 

CO 


8635 


8676 


1 8685| 


S 

i 


s 

CO 


f 8705 | 


8705 1 


[ 8720 [ 


| 8720! 


8730 


CO 
CO 


| 8742: 


j 8749 


1 8752 



528/546 



WO 01/57276 



PCT/USO 1/00668 



Top Hit Descriptor 
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Homo sapiens mitogen-activated protein kinase kinase kinase 13 (MAP3K13), mRNA j 
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Human Immunoglobulln-liko transcript-3 mRNA, complete cds j 


Homo sapiens cep250 centrosome associated protein mRNA, complete cds j 


Homo sapiens cep250 centrosome associated protein mRNA, complete cds | 


AU131671 NT2RP3 Homo sapiens cDNA clone NT2RP3003016 5' j 


Homo sapiens Immunoglobulin superfamDy, member 2 (IGSF2), mRNA j 


■a 

CM 
*? 

i 

Ja 
a 
o 

is 
1 

3 

CO 
CM 
CO 

N. 

B 

8* 

is 

Si 
» o 

Id 

O CO 

is 
If 5 

ft UJ 

8 a: 


In 

CM 

1 

CO 

ill 

i 

© 

c 
o 

< 

Z 

a 

c 
a> 
a. 

3 

o 

i 

X 
r» 

o' 

1 

Z 

8 

§ 


HUM084C02B Clontech human fetal brain polyA+ mRNA (#8535) Homo sapiens cDNA clone GEN-084C02 
5* 


Lb 
o 
1 

s 

co 
lij 
O 

I 

o 

o 
o 

< 

« 
Q. 

s 

o 

1 

X 

z 

LL 


2t32e04.r1 Soares ovary tumor NbHOT Homo sapiens cDNA clone IMAGE:724062 5' | 


601900571 F1 NIH_MGCJ9 Homo sapiens cDNA clone 1MAGE:4129744 5" | 


CO 

Is 

XI 

E 

V 

E 

"5 
E 
o 
u 

2 
t— 

!• 

I 
CD 

>. 

I 

s 

3 

to 
o 

Q. 
V 

I 

i 

£ 

3 

•8 

c 

3 

E 
£ 

!* 
I 1 . 

i- 


U[-H-BI1-adr-e-12-0-Ul.s1 NCt_CGAP_Sub3 Homo sapiens cDNA clone IMAGE:2717687 3' | 


CO 
r- 
CO 
CO 

UJ 

o 

i 

■8 

< 

z 

Q 

D 

1 

1 

n 

i 

X 

2 

3 

w 

o 
z 

CO 

5 

CM 

l 

-a 
? 

i 

Z> 


wa30b10.x1 NCI_CGAPJ<id1 1 Homo sapiens cDNA clone 1MAGE:2299579 3' similar to TR:01 5044 
015044 KIAA0335. ; 


CM1-TN01 41 -250SOO-439-b08 TN0141 Homo sapiens cDNA \ 


Homo sapiens chromosome 21 segment HS21C101 I 


601 150051 F1 NIH_MGC_19 Homo sapiens cDNA clone IMAGE:3502836 5* ! 


602127664F1 NIH_MGC_56 Homo sapiens cDNA clone IMAGE:4284542 5* | 


602127664F1 NIH_MGC_56 Homo sapiens cDNA clone IMAG&4284542 5' | 


i J s 
§■■§ ° 


ESTJ-IUMAN | 


EST_HUMAN | 


IN 


IN 


IN 


Z 


NT I 


XN: 


NT | 


NT | 




z 


z 


ESTHUMAN | 


NT | 


ESTJ-IUMAN 


EST_HUMAN | 


EST HUMAN 


EST_HUMAN | 


EST_HUMAN | 


EST_HUMAN | 


NT 


EST_HUMAN | 


EST HUMAN | 


5 
=> 
X 

CO 
UJ 


EST HUMAN 


NT 


EST HUMAN 


EST HUMAN 


I 

X 
\~ 


Top HitAcession 
No. 


AW245765.1 j 


AW 245765.1 | 


4758695" 


4758695" 


U88084.1 | 


U88084.1 ! 


AJ251 760.1 | 




X98922.1 | 




U82979.1 i 


AF022655.1 | 


AF022655.1 | 


1 

CO 

3 


11426572| 


AW513513.1 


BE783232.1 


D52650.1 


BE378495.1 


AA410545.1 | 


BF313946.1 


11424387 


AW1 39673.1 1 


AW 139673.1 


AI640190.1 


BF377897.1 


8 


BE260272.1 


BF700165.1 


BF700155.1 


Most Similar 
(Top) Hit 
BLAST E 
Value 


O.OE+00| 


O.OE+OOl 


O.OE+OOl 


O.OE+OOl 


O.OE+00 1 


o 

o 
o 


O.OE+OOl 


O.OE+OOl 


0.0E+00| 


0.0E+00 1 


0.0E+00I 


0.0E+00 1 


O.OE+OOl 


O.OE+OOl 


O.OE+00| 


O.OE+00 


00+300 | 


O.OE+00 


O.0E+00I 


0.0E+00 1 


0.0E+00I 


O.OE+00 


0.0E+00| 


O.OE+OOl 


0.0E+00 


O.OE+00 ! 


0.0E+00 


Q 

? 
LU 
O 

o 


0.0E+00 


00+300 ! 




0.55 1 


• 0.55| 


2.62| 


2.62| 


0.52I 


0.52I 


1.02| 


CO 


CO 


CO 


1.82| 


1.16| 


1.16| 


0.68| 


0.81| 


1.53 


0.55, 


11.32 


3.89 


3.98' 


3.27 


1.37 


1.38 


1.38 


0.61 


3.23 


0.45 


2.33 


2.98 


2.98 


ORF SEQ 

ID NO: 


35166| 


351671 


35168| 


351 69 | 


35172| 


35173! 


35238| 


1 

CO 


35245I 


35246) 


35260| 


35305| 


35306| 


35308| 


35325| 






35328 


35361 | 


35365) 




35374 


35379! 


35380 1 




35398 


35410 


3541 4 i 


35418 


35419 


Exon 
SEQ (D 

NO: 


21745| 


217451 


217461 


217461 


217501 


217501 


218181 


§ 

CM 


21823] 


21 823 | 


21838| 


21879) 


21879] 


21882) 


218971 


21901 


CO 

8 

CM 


21904 


21935! 


21941 | 


21943| 


21950 


21955| 


21955 


o 

CO 
CD 

CM 


21979 


21988 


1 
s 


21999 


21999 


Probe 
SEQID 
NO: 


| 87781 


| 87781 


I 8779 | 


1 8779| 


1 87831 


I 87831 


j 88511 


I 

s 


CD 

in 
oo 

CO 


1 88561 


1 8871 1 


891 3 1 


I 8913| 


8916| 


j 8931 | 


in 

i 

CO 


I 8937I 


8938 


1 


I .89751 


[ 8977! 






! 


I 


CO 

o 

CO 


I 9022 


I 


co 

1 


j 9033 



529/546 



WO 01/57276 



PCT/USO 1/00668 




530/546 



WO 01/57276 



PCT7US01/00668 



t 
o 

CD 



2 

Q_ 



& 



ill 



Si 



I I 

* ! 

E | 
w « 
c c 

If 1 



I ^ £ § 



: 8 
IS 
if 

il 

! o 

: < 
: z: 
t o 



2£ 



o Q 1 

: 1 £ 



> (3 "J 

>5 



1' 



! 3 ' 

Hi 



io 

5 



(D oj > 

.s.r 



_l h- ' 

o p 



; 5 

! Io 
\ o 

! O 

: s. 



x 5 



a 

sag 

o 



■8 a o 



531/546 



WO 01/57276 



PCT/US01/00668 



o 



, < ;J I 

. x. ) co co a 

i e*> co o g 

iii iii iii o 

O O (D 13 



if 3 
II 



CM » 
CO -Q 

10 ra , 



a j 

3 9 



It] 



■ a. 1 

; s ! 



) o 
) o 

E 5. 



o 

• if 

151 

i o ±. 

, <d CC 

• 3 _j 

CO o 

» slT 
i >?c 

• oo CO 

■ 3 r 



EI|- g 

313* 



o 

ui o 
O 

cS .. 
lli in z 



lis 



532/546 



WO 01/57276 



PCT/US01/00668 



» £ 111 



1 



2 



||| 



~ Si UJ 



Us 



iss 



533/546 



WO 01/57276 



PCT/USO 1/00668 



Top Hit Descriptor 




m 

<D 
-Q 

i 

E 

to 
c 

1 

1 
| 

m 

>. 

I 

3 
CO 

o 

1 

.£ 
c 
3 
ja 
o 

§ 
E 
E 

I 

!- 

ii 

a. ^- 
S 2 


bb26c01 X] NIH_MGC_5 Homo sapiens cDNA clone IMAGE'2964000 3' ! 


AU132349 NT2RP3 Homo sapiens cDNA clone NT2RP3004260 5' [ 


AU132349 NT2RP3 Homo sapiens cDNA done NT2RP3004260 5' | 


Ul-HF-BP0p-alr-f-05-0-Ul.r1 NIH J/IGC_51 Homo sapiens cDNA clone 1MAGE:3072897 5* | 


601595558F1 NIH_MGC9 Homo sapiens cDNA clone IMAGE:3949383 5' J 


601595558F1 NIH_MGC_9 Homo sapiens cDNA clone IMAGE:3949383 5' | 


Homo sapiens mRNA for KIAA1 231 protein, partial cds | 


Homo sapiens mRNA for KIAA1231 protein, partial cds | 


Homo sapiens KIAA0345 gene product (KIAA0345), mRNA j 


DKFZp434L0120_r1 434 (synonym: htes3) Homo sapiens cDNA clone DKF2p434L0120 5' | 


in 
to 

m 

1 

CO 

s 

o 

i 

a 

o 
w 

s 

CD 

« 
o 
E 
o 

1 
i 

6 

1 
I 

CD 
CO 

! 

Q 


AU132349 NT2RP3 Homo sapiens cDNA clone NT2RP3004260 5' j 


Homo sapiens protocadherfn alpha 1 2 (PCDH-a!pha1 2) mRNA, complete cds j 


Homo sapiens leucocyte immunoglobulin-like receptor-1 mRNA, complete cds | 


CO 

-a 

o 

"S 
a. 

i 

o 

< 
z 
OL 
E 

o 

s- 
l 

■8 

8* 

c 

f 

1 
t 

a 
o 

% 
X 


MR4-TN01 14-110900-101-e04 TN0114 Homo sapiens cDNA | 


601155227F1 NIH_MGC_21 Homo sapiens cDNA done IMAGE:3138798 5' j 


601288351 F1 NIHMGC44 Homo sapiens cDNA clone IMAGE:3613045 5* i 


601286351 F1 NtH_MGC_44 Homo sapiens cDNA clone IMAGE:3613045 5' | 


xn72b01 jd NCi.CGAP CML1 Homo sapiens cDNA done IMAGE:2699977 3' simliar to gb:X021 52_cds1 L- 
LACTATE DEHYDROGENASE M CHAIN (HUMAN); 


EST46740 Fetal kidney II Homo sapiens cDNA 5' end j 


Homo sapiens Chedlak-HIgash! syndrome 1 (CHS1), mRNA j 


EST376186 MAGE resequences, MAGH Homo sapiens cDNA j 


AU143673 Y79AA1 Homo sapiens cDNA clone Y79AA1002307 5' I 


AU 1 43673 Y79AA1 Homo sapiens cDNA clone Y79AA1 002307 5' I 


Homo sapiens killer cell inhibitory receptor KIRCI gene, exons 2, 3, and 4 | 


Homo sapiens HEF like Protein (HEFL), mRNA j 


Homo sapiens HEF like Protein (HEFL), mRNA j 


AU136637 PLACE1 Homo sapiens cDNA clone PLACE1004737 5' 1 


AU136637 PLACE1 Homo sapiens cDNA clone PLACE1004737 5' | 


Homo sapiens partial RANBP7 gene for RanBP7/importin7 and partial ZNF143 gene | 


Top Hit 
Database 


Source 


H 
Z 


ESTJHUMAN | 


EST_HUMAN | 


EST_HUMAN | 


EST_HUMAN | 


ESTJHUMAN | 


EST_HUMAN | 


Z 


l- 
z 


z 


ESTJHUMAN | 


ESTJHUMAN | 


EST_HUMAN | 


I- 
z 


z 


h- 
Z 


ESTJHUMAN | 


ESTJHUMAN | 


ESTJHUMAN I 


ESTJHUMAN 


ESTJHUMAN 


EST HUMAN 


i- 
z 


EST HUMAN 


EST HUMAN 


EST HUMAN 


z 


z 




ESTJHUMAN 


EST HUMAN 


z 


Top Hit Acesslon 
No. 


11424387 


BE206710.1 | 


AU132349.1 I 


AU132349.1 | 


AW 500936.1 ] 


BE740490.1 | 


BE740490.1 


AB033057.1 


AB033057.1 | 


7662067| 


I 

< 


AL041 084.2 ! 


AU132349.1 | 


AF1 52308.1 | 


AF009220.1 


AFO09220.1 ! 


CO 

E 

CO 


BE280793.1 


BE388700.1 | 


BE388700.1 


AW236269.1 


AA341 305.1 


11427235; 


CO 

s 

CO 

i 


AU143673.1 


AU143673.1 


AFO72408.1 


11421001! 


1 1421 001 i 


AU136637.1 


AU136637.1 




Most Simitar 
(Top) Hit 
BLAST E 
Value 


00+30 0 


O.OE+00 1 


O.OE+00| 


O.OE+OOl 


O.OE+00 1 


Q 
O 
+ 

m 
o 
d 


0.0E+00 1 


"■" 0.0E+00| 


0.0E+00 1 


§ 
+ 

Ul 
o 
d 


O.OE+00| 


O.OE+OOl 


0.0E+00 1 


O.OE+00 1 


0.0E+00| 


O.0E+00I 


O.OE+OOl 


O.OE+OOl 


O.0E+00] 


O.OE+OOl 


O.0E+00 


o 
o 
+ 

UJ 

o 

d 


O.OE+00 ! 


O.OE+OOl 


O.OE+00| 


0.0E+00 


O.OE+OOj 


0.0E+00 1 


O.0E+O0| 


o 
o 

LU 

o 
d 


0.0E+00 


00+300 j 


Expression 


a 

I 


1.71 


0.82| 


cvi 


CO 

cvi 


1.82| 


16.111 


CD 


0.45| 


0.45| 


1.76| 


<D 
CO 


d 


2.57| 


2.44] 


5.52| 


5.52I 


in 
cvi 


2.73| 




fj 


3.64 


0.75| 


0.63] 


0.75[ 


7.08) 


7.0S| 


13.11| 


CO 

cvi 


CO 
<N 


3.43| 


3.43 1 


2.24| 


ORF SEQ 
ID NO: 


36554 


36564| 


36583 1 


ot 

IO 


in 

CD 

co 


s 
1 


36603 1 


36604 1 


36605| 


36518| 


36638 1 


36644 | 


36651 | 


3 
S 


36680 1 


36661 | 


36694 1 


36720 | 


36726| 


36727| 


36733 


36734| 


36745I 


36763 | 


36774| 


36775I 


36778: 


36780 1 


36781 1 


36824* 


36825 


36839, 


CO 


23078 


23087I 


23103| 


23103] 


231121 


23118| 


23118| 


CO 

a 


23110| 


23131| 


O) 


231 54| 


23164| 


231 65 1 


23193| 


23193| 


23209 | 


23238] 


23247 | 


23247| 


23256 


23257| 


23266| 


23286| 




23299 1 


23302| 


23304j 


23304J 


23338| 


23338] 


23354 I 


Probe 
SEQ ID 


6 
z 


10153 


I 101621 


1 10178| 


I 101781 


| 10187| 


| 10193| 


| 10193| 


| 10194| 


| 10194| 


| 10206| 


■ 102241 


10229| 


| 10239! 


1 10240| 


| 10268| 


' 10268 | 


T 

CO 

s 


| 10314] 


| 10323| 


I 10323 | 


| 10332 I 


| 10333| 


| 10342| 


| 10363 j 


I 103761 


| 10376 1 


j 10379j 


| 10382| 


| 103821 


| 10416| 


] 10416! 


| 10432| 



534/546 



WO 01/57276 



PCT/US01/00668 



V) 



Top Hit Descriptor 


Homo sapiens partial RANBP7 gene for RanBP7/importin7 and partial ZNF143 gene J 


AV69571 2 GKC Homo sapiens cDNA clone GKCDXA07 5' j 


AV695712 GKC Homo sapiens cDNA done GKCDXA07 5* J 


Homo sapiens killer cell inhibitory receptor KIRCI gene, exon3 2, 3, and 4 j 


zp97h1 1 .rl Stratagene muscle 937209 Homo sapiens cDNA clone IMAGE:628197 5' j 


lO 

in 
*r 
m 

CO 

s 

o 
u 
< 

z 

Q 

V) 

c 

tt> 

1 

o 

I 

X 
3 

0L 
X 

XI 

i 

3 

\ 

8 

CO 

e 


In 

s 

CO 

s 

lii 

O 

I 

<D 
O 

< 

Z 

% 

a 

'a. 

CD 

» 
o 

E 

X 
3 
0_ 

X 
-Q 
Z 

o 

3 

s 

CO 

! 

CO 

r 
| 

CO 

TJ 


Homo sapiens KIF4 (KIF4) mRNA, complete cds J 


601491 565F1 NIH_MGC_69 Homo sapiens cDNA clone IMAGE:3893657 5* | 


601570712F1 NIHJ/"GC_21 Homo sapiens cDNA clone IMAGE:3845403 5" J 


60157071 2F1 N1H_MGC_21 Homo sapiens cDNA clone IMAGE:3845403 5' j 


AU 127403 NT2RP2 Homo sapiens cDNA done NT2RP2001212 5" j 


601 6451 34F1 NIH_MGC_56 Homo sapiens cDNA clone IMAGE:36301 77 5' | 


601645134F1 NIH_MGC_56 Homo sapiens cDNA clone IMAGE:39301 77 5' | 


60143231 7F1 NIH_MGC_72 Homo sapiens cDNA clone IMAGE:3917453 5* ) 


1 
o> 
in 
< 
Z 
Q 

s 

i 

o 

i 

X 

> 

1 

CO 

1 

CO 
LU 


Homo sapiens neurexin ill (NRXN3) mRNA j 


Tn 

CO 

£ 

CO 

lii 
o 

s 

CD 

§ 

CJ 

< 

Z 
Q 
o 

CT 
C 
CD 

f 
If) 
O 

E 
o 

X 
CN 

t- 

O 
O 
5 

X 

z 

LL 

S 


Homo sapiens hypothetical C2H2 zinc finger protein FU22504 (FLJ22504), mRNA } 


Homo sapiens mRNA for actin binding protein ABP620, complete cds j 


601105459F1 NIH_MGC_1S Homo sapiens cDNA done IMAGE:2987918 5* | 


In 

CO 

I 

OB 
CM 
lii 

o 

1 

CO 

s 

CJ 

< 
z 

Q 

o 

W 

1 

i 

o 
X 
in 

o 1 

O 
2 
X 

z 

LL 

o> 
in 

o 
I 


Homo sapiens mRNA for estrogen receptor beta, complete cds ] 


Homo sapiens mRNA for estrogen receptor, beta, complete cds ] 


o 
& 

I 

CO 
CO 
Q 

1 
ui 
O 

-f .. 

If 

Z i- 

H 

o Q 

|£ 

CO < 

Q. CO 

Is 

ji 

8 ' 

IT o> 


w 
o 
ffl 

0) 

Q. 

i 

o 

< 

Z 
OC 
E 
a> 

s 

I 

I 

o 
co 
tB 

D> 

4 
1 

s 

E 
X 


602O37045R1 NCI_CGAP_Brn64 Homo sapiens cDNA clone 1MAGE:41 84939 5' j 


602037045F1 NCI_CGAP_Brn64 Homo sapiens cDNA clone IMAGE:41 84939 5' j 


601439713F1 NIH_MGC_72 Homo sapiens cDNA done IMAGE:3924578 5' J 


60143971 3F1 NIH_MGC_72 Homo sapiens cDNA clone IMAGE:3924578 5* j 


AV71 6271 DCB Homo sapiens cDNA clone DCBBDC09 5' | 


AV716271 DCB Homo sapiens cDNA clone DCBBDC09 5' j 


j Top Hit 


Source 

1" 


NT I 


EST_HUMAN | 


EST_HUMAN ] 


IN, 


z 
< 
s 

3 

X 

io 

LU 


EST_HUMAN | 


EST_HUMAN | 


NT | 


ESTJHUMAN | 


1 
1 

Ul 


ESTJHUMAN | 


EST_HUMAN | 


ESTHUMAN | 


ESTJHUMAN | 


EST_HUMAN | 


ESTJHUMAN | 


IN I 


ESTJHUMAN | 


NT i 


NT | 


EST_HUMAN | 


ESTJHUMAN | 


NT | 


H 

z 


EST_HUMAN 


z 


ESTJHUMAN | 


ESTJHUMAN | 


ESTJHUMAN | 


EST_HUMAN | 


EST HUMAN | 


EST HUMAN 


i 

X 


No. 


AJ295844.1 I 


AV695712.1 j 


AV695712.1 | 


AF072408.1 | 


AA196387.1 | 


AA131248.1 | 


AA1 31 248.1 I 


AF1 79308.1 | 


BE880658.1 | 


BE730772.1 | 


BE730772.1 | 


AU127403.1 j 


BE958511.1 ] 


BE958511.1 j 


BE897487.1 ( 


AA31 1624.1 | 


4758827] 


BE891113.1 ] 


11560151] 


1 


BE304522.1 | 


BE304522.1 | 


AB006590.1 | 


o 


AA704457.1 


M22921.1 i 


BF340331.1 i 


BF340331.1 | 


BE897149.1 


BE897149.1 


AV716271.1 I 


AV716271.1 


Most Similar 
(Top) Hit 
BLAST E 
Value 


O.OE+OOl 


O.OE+OOl 


O.OE+00 1 


0.0E+00 1 


0.0E+00I 


o 
o 

itl 

o 

D 


O.OE+OOl 


0.0E+00j 


o 
¥ 

d 


O.OE+OOl 


O.OE+OOl 


O.OE+O0| 


o 
o 

UJ 

o 
d 


O.OE+OOl 


0.0E+00| 


o 

I 

d 


O.0E+00| 


O.OE+OOl 


0.0E+00 1 


O.0E+00| 


O.OE+00 1 


O.0E+00| 


0.0E+00I 


O.0E+OO| 


O.OE+00 


O.OE+00I 


O.OE+00 j 


O.OE+00 1 


O.OE+OOj 


00+30*0 | 


O.OE+OOj 


O.OE+OOj 


Expression 
Signal 


2.24I 


0.75] 


0.75 1 


0.76 1 


2.64 1 


1.78| 


1.78| 


1.79| 


0.88| 


11.49| 


11.49| 


0.62J 


0.86| 


0.86| 


0.98| 


0.68] 


8 

d 


0.78] 


1.10| 


1.39| 


CD 

d 


CO 

o 


4.13] 


4.13| 


1.27 


1.19| 


4.52| 


4.52| 


5.24I 


5.24] 


0.48] 


0.48| 


ORFSEQ 
ID NO: 


36840) 


36847 | 


36848| 


36855| 


36858| 


36887| 


36888| 




36978 1 


36987| 


36988 | 


36992 1 


37003 | 


37O04| 


37023 | 


37037 | 


37038I 


37051] 


37054 1 


37060 | 


37061 1 


37062| 


37067| 


37068| 


37077 


37078] 


37081 | 


37082] 


37103] 


371 04| 


37134 j 


. 37135| 


lis 

to 


23354I 


23359 | 


o 
in 

8 


23365 1 


23367 | 


23392 I 


23392 | 


23439j 


23483] 


23496| 


23495| 


23500 | 


23510| 


23510| 


23527| 


23538] 


8 

m 


23551 | 


23554 1 


23564 | 


23565] 


1 

CO 
CN 


23572| 


23572| 


23580 




23584| 


23584| 


23609I 


23609I 


23641 


i 


Probe 


"ON 


| 10432 1 


| 104371 


j 104371 


| 10443| 


| 10445| 


j 10470| 


| 10470| 


| 10517| 


| 10581 | 


| 10573| 


| 10573| 


| 10578 | 


| 10588| 


! 10588| 


| 10605| 


| 10016] 


| 10617] 


| 10629| 


| 10632] 


| 10642| 


] 10643 1 


| 10643| 


| 10650 1 


| 106501 


89901. 


§ 

o 


| 10662 


( 10662 


CO 

to 
o 


| 10687 


| 10719 


| 10719 



535/546 



WO 01/57276 



PCT/US01/00668 



i ui I 



f 1 
I i 



lit 



e 

Q- 



l9 



o 1 o 



r 



> o < 

> oo < 

> oo « 

! S5 < 



536/546 



WO 01/57276 



PCT/US01/00668 




537/546 



WO 01/57276 



PCT/US01/00668 



"5 
3 

Q 
X 
& 



CO «i 
CO ^ 

10 iH uj 



31 
11 



CO (£ 

CM ~ 
O ^ 

SI 



28 

CO a; 



! 2 ; 
3! 



2 
Q- 



^2 



I X I- ! 



Icro 



538/546 



WO 01/57276 



PCT/US01/00668 



1 ^ 



CO -Q v- 
"> g UJ 



2 



CO 



ill 
J< 

x 
& 




o 



! O E 

O ® 

_l 01 

O g 

« s. 1 



- 1 : 



5 £ 



i O 
i 2 



I UJ L 



§ 

"55 •& 



a 

UJ o 

w 2 
IJ_ ~" 
a: Q 
o 



loo 

UJ 111 z 



3 £2 



539/546 



WO 01/57276 



PCT/US01/00668 



o 

i 

£ 



o J? 
0. 



i- x » 



SI 



1 



■ § c 

; 



••E i 



4 

«2 it! 



«5i 



o •- 
5 * 



5 O 



f 8 J 



5 

q. x : 



E o < 
5. : 



3c 



lip • 
313! 



ii 

~~a 
o 



si 

CO 

jQ .. 
lo9 



540/546 



WO Ot/57276 



PCT/USO 1/00668 



Q. 

I 

jo 
E 



j 

f 


I 

\ 


io 
o 

1 

3 

lii 
o 
< 

CJ 

c 
o 
o 

< 

Z 
Q 
o 

M 

5 
a 
S 
o 

B 

to 
IO 

O 
O 

s 
I 

Z 

ul 

LO 

r>- 

CO 

o 
to 


iHomo sapiens mRNA for KIAA1 316 protein, partial cds j 


Homo sapiens mRNA for KIAA1 31 6 protein, partial cds | 


jHomo sapiens retinoblastoma-Iike 2 (p130) (RBL2), mRNA | 


Homo sapiens retinoblastoma-Iike 2 (p1 30) (RBL2), mRNA j 


23_08 Human Epidermal Keratinocyte Subtraction Library- Upregulated Transcripts Homo sapiens cDNA 
clone 23_08 5' similar to Homo sapiens cyciln B2 (CCNB2) 


bb73h05.y1 NIH_MGCJ2 Homo sapiens cDNA clone IMAGE;3048057 5' similar to SW:CD97_HUMAN 
P48960 LEUCOCYTE ANTIGEN CD97 PRECURSOR. [1] ; 


Iae74g04.s1 Stratagene schizo brain S1 1 Homo sapiens cDNA clone IMAGE:969942 3' | 


Homo sapiens eukaryotic translation initiation factor 5A (EIFBA) mRNA ( 


602134132F1 NIH_MGC_81 Homo sapiens cDNA clone !MAGE:4289502 5' | 


;dr04g05xl N"H_MGC_3 Homo sapiens cDNA clone IMAGE:2847177 5' ] 


IHuman gamma actin-!ike pseudogene, complete cds j 


wf20e11.x1 Soares_Dieckgraefe_colon_NHUC Homo sapiens cDNA clone IMAGE:2351180 3' similar to 
gb:M87789 IG GAMMA-1 CHAIN C REGION (HUMAN); 


CD 

ui 
o 
< 

1 

D 
o 

e 

a 
co 
w 

1 
•£ 

o 1 

CD 

z 

§ 
1 

o 

CO 


to 

CO 

ui 

O 

u 

§ 

< 
z 

<9 

o 

§ 

cj 
to 

1 
o 
X 

h- 

o 
o 

2 
X 

z 

LU 

8 
1 

o 

CD 


O 
o 

E 

Bl 
CD 

ta 

I 
o 
X 
-<J- 

CO 

o 
o 
Z 
Z 

! 

co 

2 
§ 

S 

a 
o 
Z 

z 
a 


Human beta-prime-adaptin (BAM22) gene, exon 16 | 


to 

I 

© 
c 

& 

I 

t 

6 
E 

Q. 

.3 
CD 

§ 

E 

3 

X 


iO 

CM 

CO 

lii 

o 

I 

D 
g 

< 

Z 

■3 

6 
| 

1 

o 
X 

R 
o 
o 
s 

X 

z 

LU 

s 

CO 
Oi 

g 
o 

CD 


Homo sapiens golgin-Uke protein (GLP), mRNA ** j 


to 
in 

5 

o 

ui 

O 
< 

8 

•s 

< 
z 

Q 
o 
o 

s 

Q. 
CO 
10 

X 

3 
o 

a 

X 

z 

LL 

§ 
§ 


|601116705F1 NIH_MGC_16 Homo sapiens cDNA clone IMAGE:3357384 5* | 


ba04d07.y1 NIH MGC 7 Homo sapiens cDNA clone IMAGE:2823373 5' similar to TR:O76022 076022 E1 B 
55KDA-ASS0CIATED PROTEIN. ; 


ba04d07.y1 N1H_MGC_7 Homo sapiens cDNA clone IMAGE;2823373 5' similar to TR;O76022 076022 E1B 
55KDA-ASSOCIATED PROTEIN. ; 


|Q V0-CTO225-1 01 299-071 -TO6 CT0225 Homo sapiens cDNA | 


nl42c08.s1 NCI CGAP Pr4 Homo sapiens cDNA clone IMAGE:1 043342 similar to gb:M951 78 ALPHA- 
ACTININ 1 . CYTOSKELETAL ISOFORM (HUMAN); 


|wp06g08J<1 NCI_CGAP_Kid12 Homo sapiens cDNA clone IMAGE:2464094 3' f 


ID 

a 
8 

ui 
o 

1 

o 
c 
o 
o 

< 

Z 
Ct 
o 
o 

s 

f 

o 
E 
o 
X 

CO 

O 
O 

S | 
X 

z 
§ 

•a 


Top Hft 
Database 
Source j 


ESTJHUMAN | 


NT I 


,NT | 


S IN 


I IN 


ESTHUMAN 


EST HUMAN 


EST_HUMAN | 


z 


z 

1 
1 

UJ 


EST_HUMAN | 


'NT | 


ESTJHUMAN 


ESTJHUMAN | 


'EST HUMAN | 


(ESTJHUMAN | 


i- 
z 


NT | 


'ESTJHUMAN | 


INT 1 


| ESTHUMAN | 


| ESTJHUMAN | 


EST HUMAN 


EST HUMAN 


i 

I 

Ul 


EST HUMAN 


| ESTJHUMAN | 


|EST_HUMAN | 


Top HltAcesslon 


No. 


BF240536.1 | 


'AB037737.1 | 


IAB037737.1 ] 


1.1430868| 


114308661 


BE122764.1 


BE01 7960.1 


:AA772837.1 | 


i 4503544] 


IBF576267.1 j 


,AW328173.1 | 


'M55083.1 | 


i 

CO 

< 


,BF306996.1 | 


BF306996.1 | 


IBF362462.1 | 


,U36264.1 | 


CD 

§ 

Z> 


e 

8 
S 

CQ 


CO 

s 
§ 


IBF207682.1 | 


IBE257744.1 | 




I 


|AW753028.1 j 


AA558707.1 


i 

co 

1 


[AW327895.1 j 


Most Similar 
(Top) Hit 


BLAST E 

Value 


O.OE+00 1 


1 O.OE+OO] 


0.0E+00| 


O.0E+00 1 


O.OE+00 1 


0.0E+00 


00+30*0 


O.OE+00 1 


1 00+30*0 


O.0E+00 1 


O.OE+OO | 


0.0E+00 j 


0.0E+00 


O.OE+00 1 


O.OE+00 1 


0.0E+00| 


j O.OE+00| 


O.OE+00| 


0.0E+00 1 


O.OE+001 


o 
o 

tu 

o 
d 


O.OE+00 1 


O.OE+00 


0.0E+O0 


0.0E+00 1 


O.OE+00 


1 O.OE+00 1 


f O.0E+00I 


c 
o 

I 


Signal 


5.42| 


I 1.66| 


I 1-68| 


3.41 1 


3.41| 


2.06 


n 

CO 


8 
oi 


CD 


2.25| 


IO 
lO 


s 


159.29 


s 




59.51| 


cm! 


CM* 


4.74| 


! 1.54| 


a 


CM 
CO 


4.13 


4.13 


CO 

c*> 




1 3.12| 


a 


ORFSEQ 
ID NO; 




I 38075| 


| 38076| 


| 38070] 


38080 1 


38085 


38086 


' 38089| 


38103) 


j 38110| 


38114| 




38123 


! 38124| 


38125| 


38133 | 


38154] 


38155] 




| 38177| 






38225 


38226 


38228| 




i 31322| 


38234| 


Isi 

to 


| 24510] 


| 24520| 


j 24520) 


i 24524j 


! 24524| 


co 
CN 
IO 

eg 


8 

IO 
CM 


24533 | 


CM 


24550 | 


, 24553 1 


24558 | 


24562 


24563 | 


co 

CD 

to 

CM 


24569| 


24585] 


24585| 


! 24591 | 


24601 | 


M- 

s 

CM 


| 24605| 


24648 


CM 


| 24650] 


24655 


j 18451! 


| 24656| 


Probe 
SEQID 
NO: 




s 


8 


CO 

S 


CO 

8 


o 

OL> 
IO 


o> 

IO 


tO 

CO 

in 


to 


CM 
5 


tn 
5 


o 

CM 
CD 


I 


8 


to 

s 


S 


1 


00 

s 


s 


to 

8 


CO 
CD 
CD 


1 


s 


CM 

s 


T 

co 

CD 


at 

s 


o 

s 





541/546 



WO 01/57276 



PCT/US01/00668 



r— - ■ 

Top Hit Descriptor 




CO 
O) 

s 
§ 

CM 
LU 

< 

(D 
C 

o 
o 

i 

Q 

o 
« 

5 
g- 

M 

o 
X 

S 

3 

CO 
< 

o 
o 

6 

Z 
w 

5 

9 
h> 
9 
1 

7 
o 

I 


| Homo sapiens neurexin 1 1 1 (NRXN3) mRNA j 


CO 

(D 

OJ 

m 

co 
co 
lij 

1 

Q> 
C 

o 
o 

<f 

Z 
o 
o 

§ 

s 

o 

i 

X 
o 

O 
O 
5! 
X 
Z 

1 

CO 
Q 

« 

o 


]601659088R1 NIH_MGC_70 Homo sapiens cDNA clone IMAGE:3895916 3" j 


|IL5-HT0731-020500-077-f05 HT0731 Homo sapiens cDNA | 


DKFZp434G1 78_r1 434 (synonym: htes3) Homo sapiens cDNA clone DKFZp434Gl78 5' S 


MR4-BT0358-1 30900-01 6-a04 BT0358 Homo sapiens cDNA | 


z 
z 
2 

3 
§ 

r- 

CO 

CO 
xi 
c* 

S 
jo 

I 
Hi 

CO 

8 

ui 

s 

2 

o 

c 
o 
o 

< 

Z 
O 
o 

to 

s 

1 

o 

§ 

X 

If 

Si 

IS 


nz11c07.s1 NCLCGAP_GCB1 Homo sapiens cDNA clone 1MAGE:1287468 3' similar to TR:Q13666 | 
Q13686 ALKB HOMOLOG PROTEIN. ; 


nz11c07.s1 NCI_CGAP_GCB1 Homo sapiens cDNA clone IMAGE:1287468 3' similar to TR:Q13686 
Q13686 ALKB HOMOLOG PROTEIN. ; 


CO 

o 

O) 
CO 

I 

a> 

c 

< 

Z 
Q 
o 
o 

1 

o 
X 

s 

o 
o 

5 
X 

z 

LL 

o 
o> 
o 

I 

o 

CO 


7f27T1Zx1 NCI_CGAP_CLL1 Homo sapiens cDNA clone IMAGE:3295919 3' similar to TR:O00409 000409 
CHECKPOINT SUPPRESSOR 1. ; 


to 

j 
s 

co 
lij 
O 
< 
1 
o 
c 
o 

Q 

u 
» 

*& 

1 

o 
co 

O 

% 

X 

z 

LL 
IO 
CO 

co 

1 

o 

CO 


|601279335F1 NIH_MGC_39 Homo sapiens cDNA clone IMAGE:381 1 144 S | 


j AV757420 BM Homo sapiens cDNA clone BMFAGH03 5' J 


| Homo sapiens polycystic kidney disease-as sociated protein (PKD1 ) gene, complete cds j 


Homo sapiens polycystic kidney disease-associated protein (PKD1) gene, complete cds I 


|AU13821 1 PLACE1 Homo sapiens cDNA clone PLACE1008077 5' j 


|601441096F1 NIH_MGC_72 Homo sapiens cDNA clone IMAGE:3916270 6' j 


co 

CO 

1 

<>} 

UJ 

O 

| 

© 

s 

z 
% 

in 

s 

Q. 

s 

O 
1 

X 
m 

CM 

c 
CO 

oJ 

\ 

o 

Z 

*R 

o 

I 


|601572186T1 NIH_MGC_55 Homo sapiens cDNA clone IMAGE:383901 2 3' | 


|601572186T1 NIH_MGC_65 Homo sapiens cDNA clone IMAGE:3839012 3' j 


]AU141882THYR01 Homo sapiens cDNA clone THYRO1001398 5' | 


|AU141882 THYR01 Homo sapiens cDNA clone THYRO1001398 5' | 


wz91h01 NCI_CGAP_Brn25 Homo sapiens cDNA clone IMAGE:2566225 3' similar to WP:F53H10,2 
CE11040 ZINC FINGER, C2H2 TYPE ; 


7h22b10.x1 NCI_CGAP Co16 Homo sapiens cDNA clone IMAGE:3316699 3' similar to TR:Q1 3458 Q13458 
TRIO. ; 


to 
Zj 

CO 

1 

a 
< 

E 

o 

B 
1 

o 
X 

f 
B 

! 
i 

CO 

i 

! 

8 


Top Hit 
Database 


Source 


ESTJHUMAN | 


'NT I 


I ESTJHUMAN ] 


ESTJHUMAN | 


EST HUMAN | 


i ' 
I □ 


ESTJHUMAN ] 


ESTJHUMAN 


ESTJHUMAN 


X 

is 

UJ 


.EST HUMAN 1 


EST HUMAN 


j ESTJHUMAN | 


i 
I 


|EST HUMAN | 


INT I 


|NT | 


|EST_HUMAN | 


| ESTJHUMAN | 


'EST_HUMAN ] 


I ESTJHUMAN j 


j ESTJHUMAN | 


|EST_HUMAN | 


| ESTJHUMAN | 


EST HUMAN 


EST HUMAN 


Z 

i 1 


Top HitAcesston 
No. 


,AW292776.1 ] 


4756827| 


a 

CD 
LU 
CD 


1 

UJ 

co 


(BE185656.1 j 


IO I 

CD < 

l\ 


|BF082504.1 ■ \ 


AI923116.1 


AA760913.1 


AA760913.1 


[BE910546.1 ! 


BE676347.1 


|BE61 5566.1 \ 


[BE615666.1 


IAV757420.1 | 


|U9891.1 


|L39891.1 


(AU138211.1 j 


|BE622317.1 i 


[AI939634.1 | 


i 

CO 

CO 


1BE748B99.1 


(AU141882.1 | 


|AU141882.1 I 


o 

1 


BF002333.1 


IALO43705.1 | 


Most Similar 
(Top) Hit 
BLAST E 


Value 


! O.OE+00 1 


0.0E+00| 


j O.OE+OOj 


| O.OE+00 j 


| O.OE+00 1 


| O.OE+OOl 


I O.OE+Oo] 


O.OE+00 


O.OE+00 


0.0E+00 


| O.OE+OOl 


0.0E+0O 


\ O.OE+00 1 


O.OE+00 1 


3 
+ 

UJ 

o 

o 


I O.0E+0O| 


I O.OE+00 1 


I O.OE+OOj 


j O.OE+00] 


[ O.OE+00 1 


! O.OE+00 1 


! O.OE+00| 


l00+30'0 | 


I O.OE+OOl 


O.OE+00 


O.OE+00 


! O.OE+00] 


s 

« "63 

w c 


I 1.63! 


j 2.09| 


1 2.43] 


1 2.43| 


I 3.67| 


in u 


i 2.29 


19.22 


8.71 


8.71 


I 3.511 


5.45 


I 2.02| 


I .2.02| 


t 2.13| 


I 5.01| 


| 5.01| 


I 3.67| 


I 5.81! 


i 2.22| 


| 14.23] 


S 14.231 


I 2.54I 


lo- 
rd 


N. 
O) 


2.38 


1 1-48, 


ORFSEQ 
ID NO: 


| 38254] 


| 37432| 


j 38186] 


| 381 87| 


! 38186| 


I 38202) 


38211| 


38214 


o 

CO 


36260 


1 38266| 


37440 


37443 | 


37444 | 


37453| 


38273| 


38274 | 


38287| 


38304| 


38326| 


I 

CO 


§ 

co 

CO 


38343 | 


| 38344| 


38347 


38349 


j 38366| 


Exon 
SEQ ID 
NO: 


| 25706 | 


| 2391 5| 


| 2461 Oj 


| 24610 | 


| 24611| 


I 24624I 


, 24631 1 


24635 


24681 


24681 


CO 
CO 
CD 


23921 


] 23924| 


i 23924| 


1 239321 


] 24893 | 


| 24693 | 


1 24705| 


] 24718| 


| 24742 1 


I 24750| 


| 24750| 


| 24759] 


| 24759] 


24762 


25707 


| 24780 1 


Probe 
SEQ ID 

NO: 




CO 


3 




B 




IS 


S 
I-. 


s? 


CO 

se 


co 


I 


8 


1 




1 


I 


8 


Iseeu I 


o 


I 


s 

CO 


I s - 
I s - 

CO 


r»" 
h* 
CO 


o 


s 


1 



542/546 



WO 01/57276 



PCT/US01/00668 



s 

o 

CQ 



" — Q. 



i j ! 



el 



I .2 J 



B 

Q_ 

I 



_J 

l = w 
1131 



% 73 



O .. 

5 Q 



2Sg 



543/546 



WO 01/57276 



PCT/USO 1/00668 



I 



2 



c 
o 
m 



"> 2 u3 



Hi 



L 3. < 

! 8 \ 



Hi 



13 



I 

ill 

n 

I 

s 



X K 



: 8- 

: & 



c 

1 

li 



J "8 

8 



31 

X 

& 



I X H fi 



2 Q 
o 



544/546 



WO 01/57276 



PCT/US01/00668 




545/546 



WO 01/57276 



PCT/US01/00668 





< 
z 
cc 
E 
















LL 

CO 

a 

o 














Top Hit Descriptor 


Homo sapiens cleavage and polyadenylation specific factor 1 , 160kD subuni 


Homo sapiens chromosome 21 segment HS21 0048 


Homo sapiens low density lipoprotein-related protein 2 (LRP2), mRNA 


Homo sapiens calcineurin binding protein 1 (K1AA0330), mRNA 


Homo sapiens DKFZp434P211 protein (DKF2P434P211), mRNA 


Homo sapiens period (Drosophila) homolog 3 (PER3), mRNA 


Human endogenous retrovirus pHE.1 (ERV9) 

Homo sapiens chromosome 1 2 open reading frame 3 (C120RF3), mRNA 


Top Hit 
Database 
Source 


!c 


Z 






Z 


Z 




c 
o 


to 
m 




CO 

1 


W 


in 


r- 

CO 


i 

s 


Top Hit Ace 
No. 




AL1 63246.2 


o 

S 


1 




s 


_ 8 

i 


Most Similar 
(Top) Hit 
BLAST E 
VaJue 


O.OE+00 1 


O.OE+00 1 


O.0E+00| 


O.OE+OOl 


O.OE+OOl 


O.OE+OOl 


O.OE+00 
O.OE+00 


Expression 
Signal 


8 

c\i 


% 
co 


cS 




s 

CO 




1.63 
1.4 


ORFSEQ 
ID NO: 


31733| 




§ 


CO 




31685j 


s 


Exon 
SEQ ID 
NO: 


25488| 




CN 

s 

CO 


I 


25568 1 


o 

s 


256131 
14203! 


Probe 
SEQ ID 
NO: 


CM 

8 


Si 

8 




to 

CO 

a 


CO 

o 

CO 


In 
8 


13082 
13103 



546/546 



WO 01/57276 



PCT/US01/00668 



I/IO 



GENOMIC SEQUENCE 
DATABASE 



IDENTIFY FUNCTIONAL SEQUENCES 
WITHIN GENOMIC DATA 




IDENTIFY SUBSET OF 
FUNCTIONAL SEQUENCES 
FOR CHOSEN ASSAY 



CREATE PHYSICAL A 
INFORMATIONAL SUBST 
PERFORM CHOSEN / 


ND/OR 
RATE AND 
iiSSAY 








ANNOTATE SEQUENC 

1 5 


E DATA 



SEQUENCE 
ANNOTATION 
FROM EXTERNAL 
SOURCES 



1 



J -v STORE ANNOTATED 
N SEQUENCE DATA 



DISPLAY 
ANNOTATED 
SEQUENCE 



Fig. 1 



WO 01/57276 



PCT/US01/00668 



2/10 



r 



QUERY GENOMIC 
SEQUENCE DATABASE 






YES 24 
r ( 


PREPROCESS SEQUENCE 
FOR DESIRED APPROACH 
AND METHOD 




■ r 


PROCESS SEQUENCE 








■ r" 



PREPARE (CONSENSUS) 
OUTPUT, 



OUTPUT TO 
PROCESS 300 



r 



REPORT RESULT 



, c 

■ ( END ') 




DISPLAY \ 
V OUTPUT I 



Fig. 2 



WO 01/57276 



PCTAJS01/00668 



3/TO 



r ~*u a . □ a □ □ 

id on □ □ a 



8f\ 
I 



\ 



Fig. 3 



WO 01/57276 



PCT/US01/00668 



4/10 



Grail 
Genefinder 
Diction 
EST hit 
Chip seq. 



7 



$0 



SI 3 
■ f I 

m ' 


• "T 




Bps 



■«1 



Expression and Base number 

signal. for 3 tissues * 



Fig. 4 



WO 01/57276 



PCT/US01/00668 



5/10 



5000 



4000 



c 

g 3000 

LLi 

o 2000 
6 



1000 - 




i i i i i i i i i i I* i i i i i i i rv i i i i , , i ■ i i i i i i > i i i . i i . i t . . . . i 
15 30 45 60 75 90 10 12 13 15 16 18 19 21 22 24 
50 00 50 00 50 00 50 00 50 00 



Length 



Fig. 5 



WO 01/57276 



PCT/US01/00668 




Fig. 6 



WO 01/57276 



7/10 



PCT/US01/00668 




WO 01/57276 



PCT/US01/00668 



8/10 



LO 
CM 



5 



o o 

CO LO 



o 



o o o o 

CO CM r- 



00 

I s * 
LO 

I s * 

co 
CD 

O) 
CO 
LO 

LO 
O 
LO 



CN 
^J* 

I s * 
CO 
CO 

CO 
LO 
CN 

CO 
CO 



a 
E 
z 
o 

Q. 
CO 



oo 



O 
CO 



o 

LO 




A}1SU3}U| ihuBis 



WO 01/57276 



PCT/US01/00668 




WO 01/57276 PCT/US01/00668 

10/10 

Fig. lO 




(12) INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) 



(19) World Intellectual Property Organization 
International Bureau 




(43) International Publication Date 0<>) International Publication Number 

9 August 2001 (09.08.2001) PCT WO 01/057276 A3 



(51) International Patent Classification 7 : 



CI2Q 1/68 



(21) International Application Number: PCTAJS0 1/00668 

(22) International Filing Date: 30 January 2001 (30.01.2001) 

(25) Filing Language: English 

(26) Publication Language: English 



(30) Priority Data: 

60/180,312 
60/207,456 
09/608,408 
09/632,366 
60/234,687 
60/236,359 
0024263.6 



4 February 2000 (04.02.2000) US 

26 May 2000 (26.05.2000) US 

30 June 2000 (30.06.2000) US 

3 August 2000 (03.08.2000) US 

21 September 2000 (21.09.2000) US 

27 September 2000 (27.09.2000) US 

4 October 2000 (04. 1 0.2000) GB 



(71) Applicant (for all designated States except US): AEOIM- 
ICA, INC. [US/US]; 928 East Arques Avenue, Sunnyvale, 
CA 94085 (US). 

(72) Inventors; and 

(75) Inventors/Applicants (for US only): PENN, Sharron, 
G. [GB/US]; 617 South Delaware Street, San Mateo, CA 
94402 (US). HANZEL, David, YL [US/US]; 988 Loma 
Verde Avenue, Palo Alto, CA 94303 (US). CHEN, Wen- 
sheng [CN/US]; 210 Easy Street #25, Mountain View, CA 
94043 (US). RANK, David, R. [US/US]; 117 El Dorado 
Commons, Fremont, CA 94539 (US). 



(74) Agent: RONNING, Royal, N., Jr.; Amersham Pharma- 
cia Biotech, Inc., 800 Centennial Avenue, Piscataway, NJ 
08855 (US). 

(81) Designated States (national): AE, AG, AL, AM, AT, AU, 
AZ, BA, BB, BG, BR, BY, BZ, CA, CH, CN, CR, CU, CZ, 
DE, DK, DM, DZ, EE, ES, FI, GB, GD, GE, GH, GM, HR, 
HU, ID, IL, IN, IS, JP, KE, KG, KP, KR, KZ, LC, LK, LR, 
LS, LT, LU, LV, MA, MD, MG, MK, MN, MW, MX, MZ, 
NO, NZ, PL, PT, RO, RU, SD, SE, SG, SI, SK, SL, TJ, TM, 
TR, TT, TZ, UA, UG, US, UZ, VN, YU, ZA, ZW. 

(84) Designated States (regional): ARIPO patent (GH, GM, 
KE, LS, MW, MZ, SD, SL, SZ, TZ, UG, ZW), Eurasian 
patent (AM, AZ, BY, KG, KZ, MD, RU, TJ, TM), European 
patent (AT, BE, CH, CY, DE, DK, ES, FI, FR, GB, GR, EE, 
IT, LU, MC, NL, PT, SE, TR), OAP1 patent (BF, BJ, CF, 
CG, CI, CM, GA, GN, GW, ML, MR, NE, SN, TD, TG). 

Published: 

— with international search report 

— with sequence listing part of description published sepa- 
rately in electronic form and available upon request from 
the International Bureau 

(88) Date of publication of the international search report: 

9 January 2003 

For two-letter codes and other abbreviations, refer to the "Guid- 
ance Notes on Codes and Abbreviations" appearing at the begin- 
ning of each regular issue of the PCT Gazette. 



r- 
<s 
r- 

IT) - 

5 (54) Title: HUMAN GENOME-DERrVED SINGLE EXON NUCLEIC ACID PROBES USEFUL FOR ANALYSIS OF GENE 

▼H EXPRESSION IN HUMAN BONE MARROW 

© 

^ (57) Abstract: A single exon nucleic acid microarray comprising a plurality of single exon nucleic acid probes for measuring gene 
expression in a sample derived from human bone marrow is described. Also described are single exon nucleic acid probes expressed 
^ in the bone marrow and their use in methods for detecting gene expression. 



INTERNATIONAL SEARCH REPORT 



tn id Application No 

PCT/US 01/00668 



A. CLASSIFICATION OF SUBJECT MATTER 

IPC 7 C12Q1/68 




Accordlnq to International Palenl Classification (IPC) or to both national classification and IPC 




B. FIELDS SEARCHED 


Minimum documentation searched (classification system foDowed by classification symbols) 

IPC 7 C12Q 


Documentation searched other than minimum documantatlon to the extent that such documents are inciu 


ded Irt the fields searched 



Electronic data base consulted during the International search (name of data base and, where practical, search terms used) 

BIOSIS, WPI Data, EPO-Internal , MEDLINE, EMBASE, CHEM ABS Data, SEQUENCE SEARCH 



C. DOCUMENTS CONSIDERED TO BE RELEVANT 



Category * Cllaltort of document, with indication, where appropriate, of the 



Relevant lo claim No. 



STAUDER R ET AL: "Different CD44 splicing 
patterns define prognostic subgroups in 
multiple myeloma." 

BLOOD, (1996). VOL. 88, NO. 8, PP. 3101-8. 
JOURNAL CODE: A8G. ISSN: 0006-4971., 

XP002182129 
Basel Institute for Immunology, 
Switzerl and. 
the whole document 



13 



LH 



Further documents are listed In Iho continuation of box C. 



jX* | Patent famDy members are listed In annex. 



• Special categories of died documents : 

'A* document defining the general state of the art which Is not 

considered to be of particular relevance 
*E* earlier document but published on or after the International 

filing date 

'L* document which may throw doubts on priority dalm(s) or 
which is cited to establish the publication date of another 
citation or other special reason (as specified) 

'O* document referring to an oral disclosure, use, exhibition or 
other means 



. it published prior to the international filing date but 

later than the priority dale claimed 



*T* later document published after the international filing date 
or priority data and not in contact with the application but 
cited to understand the principle or theory underlying the 
invention 

'X' document of particular relevance; the claimed invention 
cannot be considered novel or cannot be considered to 
Involve an inventive step when the documenl Is taken alone 

'Y* documenl of particular relevance; the claimed Invention 
cannot be considered lo involve an Invenllve step when the 
document Is combined with one or more other such docu- 
ments, such combination being obvious to a person skilled 
In the art. 

'&' document member of the same patent family 



Date ot the actual completion of the international search 

2 August 2002 



Dale of mailing of the international search report 

0 5.09.02 



Name and mailing address of the ISA 

European Patent Office, P.B. 5B18 Patenltaan 2 
NL - 2280 HV Rijswijk 
Tel. (+31-70) 34O-2O40, Tx. 31 651 epo nl. 
Fax: (+31-70) 340-3016 



Authorized officer 



Luzzatto, E 



Form PC7/ISA/210 (second shoel) {Jut/ 1892) 



INTERNATIONAL SEARCH REPORT 



111 i) Application Mo 

PCT/US 01/00668 



C.(GonHnuBtlon) DOCUMENTS CONSIDERED TO BE RELEVANT 


Calogory ■ 


Crtaiion ol document, with Indicatlon.where appropriate, of the relevant passage 


Relevant to claim No. 


X 


MACKAY C R ET AL: "EXPRESSION AND 
MODULATION OF CD44 VARIANT ISOFORMS IN 
HUMANS" 

JOURNAL OF CELL BIOLOGY, ROCKEFELLER 
UNIVERSITY PRESS, NEW YORK, US, US, 
vol. 124, no. 1/2, 1994, pages 71-82, 
XP000471699 
ISSN: 0021-9525 
the whole document 


13 


X 


SCREATON G R ET AL: "GENOMIC STRUCTURE OF 
DNA ENCODING THE LYMPHOCYTE HOMING 
RECEPTOR CD44 REVEALS AT LEAST 12 
ALTERNATIVELY SPLICED EX0NS" 
PROCEEDINGS OF THE NATIONAL ACADEMY OF 
SCIENCES OF USA, NATIONAL ACADEMY OF 
SCIENCE. WASHINGTON, US, 
vol. 89, no. 24, 

15 December 1989 (1989-12-15), pages 
12160-12164, XP000470187 
ISSN: 0027-8424 
abstract; table 1 


13 


X „ 


.DATABASE EBI 'Online! 
EMBL; 

Accession Number HSPA10C6 (Z77862), 

5 August 1996 (1996-08-05) 

MUNGALL AJ ET AL. : "H. sapiens flow-sorted 

chromosome 6 TaqI fragment, SC6pA10C6" 

XP002182130 

abstract 


13 


X 


O'CONNOR H E ET AL: "Abnormalities of the 
ETV6 gene occur in the majority of 
patients with aberrations of the short arm 
of chromosome 12: a combined PCR and 
Southern blotting analysis." 
LEUKEMIA, (1998 JUL) 12 (7) 1099-106., 

XP0O1022502 
p. 1099, col. 2, last par. -p. 1101, col. 
2, 1st full par. 
column 2; figure 1 


13 


X 


DATABASE EBI 'Online! 
EMBL; 

Accession Number AC007372, 
27 April 1999 (1999-04-27) 
XP002182131 
abstract 


13 


A 


WO 99 33979 A (CHIRON CORP) 

8 July 1999 (1999-07-08) 

page 1, line 19 -page 8, line 26; claims 

19-21 

-/- 


1,12 



F«m PCT/lSA/210 (continuation ol eocood shod) (July 1092) 



INTERNATIONAL SEARCH REPORT 



Int I Application No 

PCT/US 01/00668 



C.(ContlnuBtlon) DOCUMENTS CONSIDERED TO BE RELEVANT 



Calegory ■ 


Citation of document, with Indication .where appropriate, of the relevant passages 


Relevant to claim No. 


A 


US 5 618 671 A (LINDSTROEM PER) 

8 April 1997 (1997-04-08) 

column 1, line 60 -column 2, line 19 

column 4, line 36 -column 5, line 29; 

claims 


1-27 


A 


EISEN N B ET AL: "Cluster analysis and 
display of genome-wide expression 
patterns" 

PROCEEDINGS OF THE NATIONAL ACADEMY OF 

SCIENCES OF USA, NATIONAL ACADEMY OF 

SCIENCE. WASHINGTON, US, 

vol. 95, December 1998 (1998-12), pages 

14863-14868, XP002140966 

ISSN: 0027-8424 

the whole document 


1,12 


A 


W0 92 13075 A (GENETICS INST) 

6 August 1992 (1992-08-06) 

page 28, line 4 - line 6; claims 


1,12,13 


A 


S0L0VYEV V V ET AL: "PREDICTING INTERNAL 
EXONS BY OLIGONUCLEOTIDE COMPOSITION AND 
DISCRIMINANT ANALYSIS OF SPLICEABLE OPEN 
READING FRAMES" 

NUCLEIC ACIDS RESEARCH, OXFORD UNIVERSITY 
PRESS, SURREY, GB, 

vol. 22, no. 24, 1994, pages 5156-5163, 
XP002915964 
ISSN: 0305-1048 
the whole document 


1-27 


A 


GUAN ET AL: "GRAIL: an Integrated 
artificial intelligence system for gene 
recognition and Interpretation" 
PROCEEDINGS OF THE CONFERENCE ON 
ARTIFICIAL INTELLIGENCE APPLICATIONS. 
MONTEREY, MAR. 2-6, 1992, LOS ALAMITOS, 
IEEE COMP. SOC. PRESS, US, 
vol. CONF. 8, 2 March 1992 (1992-03-02), 
pages 9-13, XP010027422 
ISBN: 0-8186-2690-9 
the whole document 


1-27 


P,X 


PENN S G ET AL: "Mining the human genome 
using microarrays of open reading frames." 
NATURE GENETICS, (2000 NOV) 26 (3) 315-8., 

XP002183793 
the whole document 

-/- 


1 



Form PCTISA^IO (coniinuaion ot second shoot) (July 1992) 



INTERNATIONAL SEARCH REPORT 



Int nal Application No 

PCT/US 01/00668 



C(Contlnuation) DOCUMENTS CONSIDERED TO BE RELEVANT 



Category • Citation of document, with Indlcatlon.where appropriate, of the relevant passages 



Relevant lo dalm No. 



DATABASE EBI 'Online! 

9 May 1997 (1997-05-09) 

MARRA M. ET AL.: "The WashU-HHMI mouse EST 

project; vc72c02.sl Knovrtes Solter mouse 2 

cell Mus muscullus cDNA clone IMAGE: 

780098" 

Database accession no. AA414703 

XP002208274 

abstract 

DATABASE EBI 'Online! 

16 October 1997 (1997-10-16) 

MARRA M. ET AL.: "The WashOHMI mouse EST 

project; vl60c06.sl Knowles Solter mouse 2 

cell Mus musculus cDNA clone IMAGE: 

976618" 

XP002208275 

abstract 

DATABASE EBI 'Online! 

27 April 1999 (1999-04-27) 

DICKHOFF R. ET AL.: "Sequencing of human 

chromosome 14q31 region" 

XP002208276 

abstract 



13,14, 
18,20,21 



13,14, 
16,18, 
20,21 



13,14,18 



Form PCTrtSA/210 (contnuaJtan o! second shoot) (July tB9Z) 



INTERNATIONAL SEARCH REPORT 



jfional application No. 
PCT/US 01/00668 



Box I Observations where certain claims were found unsearchable (Continuation of item 1 of first sheet) 



TWs Internationa! Search Report has not been established In respect of certain claims under Article 17(2)(a) for the following reasons: 
1. | I Claims Nos.: 

1 — because they relate to subject matter not required to be searched by this Authority, namely. 



El 



Claims Nos.; 

because they relate to parts of the international Application that do not comply with the prescribed requirements to such 
an extent that no meaningful international Search can be carried out, specifically: 

see FURTHER INFORMATION sheet PCT/ISA/210 



3 ' ^ brcause they are dependent claims and are not drafted In accordance with the second and third sentences of Rule 6.4(a). 



Box II Observations where unity of invention is lacking (Continuation of item 2 of first sheet) 



This International Searching Authority found multiple inventions In this international application, as follows: 

see additional sheet 

1.1 I As all required additional search lees were timely paid by the applicant, this International Search Report covers all 
' — I searchable claims. 

2. ["I As all searchable claims could be searched without effort justifying an additional 1ee, this Authority did not Invite payment 
— of any additional fee. 

3. nn As only some of the required additional search fees were timely paid by the applicant, this International Search Report 
La-J covers only those claims for which fees were paid, specifically claims Nos.: 

1-27 



4. I 1 No required additional search fees were timely paid by the applicant Consequently, this International Search Report is 
— restricted to the invention first mentioned In the claims; it Is covered by claims Nos.: 



Remark on Protest Q^j The additional search fees were accompanied by the applicant's protest 

| x | No protest accompanied the payment of additional search fees. 



Form PCT/ISA/210 (continuation of first sheet ( 1 }) (July 1 998) 



FURTHER INFORMATION CONTINUED FROM PCT/ISA/ 210 



This International Searching Authority found multiple (groups of) 
inventions in this international application, as follows: 

1. Claims: 1-27 (partially) 

A probe comprising the nucleotide sequence SEQ ID 1 (see 
claim 13), or a fragment thereof having a length of at least 
15 bp (see ISA form 206), in particular comprising the 
sequence SEQ ID 13115 (see p. 92 of the description, which 
indicates that this sequence corresponds to the exon 
comprised in SEQ ID 1), spatially addressable set of probes 
comprising the said sequence (claim 1), microarrays 
comprising said sequence (claim 12), a method for measuring 
gene expression (claim 22), a method for identifying exons 
(claim 23) and a method for assigning exons to a single gene 
(claim 24) comprising using the said arrays, a peptide 
encoded by SEQ ID 1 or 13115 (claims 26-27), in particular 
having the sequence SEQ ID 26013, which Is the translation 
from SEQ ID 13115 (see p. 74 of the description). 



2. Claims: l-27(part1ally) 

A probe comprising the nucleotide sequence SEQ ID 2, or a. . - 
fragment thereof having a length of at least 15 bp (see ISA 
form 206), in particular comprising the sequence SEQ ID 
13116 (see p. 92 of the description, which indicates that 
this sequence corresponds to the exon comprised in SEQ ID 
2), spatially addressable arrays comprising the said 
sequence, a method for measuring gene expression, a method 
for Identifying exons and a method for assigning exons to a 
single gene comprising using the said arrays, a peptide 
encoded by SEQ ID 2 or 13116, In particular having the 
sequence SEQ ID 26014, which is the translation from SEQ ID 
13116 (see p. 74 of the description). 



...Inventions 3-13114: similar subject-matter as above 
related to SEQ IDs 3-13114. 



FURTHER INFORMATION CONTINUED FROM PCT/ISA/ 210 



Continuation of Box 1.2 



The following statements concerning the impossibility of performing a 
meaningful search according to Art. 17(2) PCT are made for the 
subject-matter for which a search has ben performed and which has been 
identified as Inventions 1 and 2 in PCT form 206. 

1) Claims 1-3, 5, 6, 8-15 and 18-24 relate to fragments of undisclosed 
length or characteristics which cannot therefore be meaningfully 
searched. These claims have thus been searched only Insofar as related to 
fragments having a length of at least 15 nt (see claim 15 and description 
pages 10, 1. 15-22). 

2) Present claims 1-12 and 22-24 relate to an extremely large number of 
possible sets of nucleic acid probes comprising SEQ ID 1 or 2 and 
microarrays comprising the said sets. Therefore, the claims lack clarity 
and concisesness (Art. 6 PCT) to such an extent as to render a meaningful 
search over their whole scope Impossible. Consequently, with respect to 
the said sets and microarrays the search has been carried out only 
insofar as related to the SEQ ID 1 and 2 as such. 

3) In view of the absence of any indication as to which other peptides 
could be encoded by SEQ ID 1 and 2, the search with respect to claim 26 
has been limited to the peptide sequences actually disclosed 1n the 

■application, i.e. 26013 and 26014 (Art. 6 PCT). 

4) Claims 15-21 relate to nucleic probes, solely defined in that they 
code for a polypeptide having the sequence SEQ ID 26013 or 26014. 
However, a peptide Is potentially coded by an extremely large number of 
nucleic add sequences. Hence, claims 15-21 lack clarity and concisesness 
to such an extent as to render a meaningful search over their whole scope 
impossible. The search has thus been limited to SEQ ID 1, 2, 13115 and 
13116. 

The applicant's attention 1s drawn to the fact that claims, or parts of 
claims, relating to Inventions in respect of which no international 
search report has been established need not be the subject of an 
international preliminary examination (Rule 66.1(e) PCT). The applicant 
1s advised that the EPO policy when acting as an International 
Preliminary Examining Authority is normally not to carry out a 
preliminary examination on matter which has not been searched. This 1s 
the case irrespective of whether or not the claims are amended following 
receipt of the search report or during any Chapter II procedure. 



INTERNATIONAL SEARCH REPORT 



Int no! Application No 

PCT/US 01/00668 



Patent document 
cited In search report 



Publication 
date 



Patent family 
member(s) 



Publication 
date 



WO 9933979 



08-07-1999 



US 5618671 



08-04-1997 



All 

AU 


1QOQCQQ A 


19-07-1999 


EP 


1042470 A2 


11-10-2000 


OP 


2002511231 T 


16-04-2002 


WO 


9933979 A2 


08-07-1999 


US 


2002034800 Al 


21-03-2002 


AT 


204331 T 


15-09-2001 


DE 


69330604 Dl 


20-09-2001 


DE 


69330604 T2 


04-07-2002 


EP 


0647278 Al 


12-04-1995 


OP 


7508407 T 


21-09-1995 


WO 


9400597 Al 


06-01-1994 



W0 9213075 


A 


06-08-1992 US 


5326558 A 


05-07-1994 




AT 


144780 T 


15-11-1996 






AU 


640686 B2 


02-09-1993 






AU 


6295790 A 


11-03-1991 






CA 


2064738 Al 


09-02-1991 






DE 


69029040 Dl 


05-12-1996 






DE 


69029040 T2 


27-02-1997 






DK 


15492 A 


07-02-1992 






DK 


487613 T3 


25-11-1996 






EP 


0487613 Al 


03-06-1992 






EP 


0732401 A2 


18-09-1996 






ES 


2094757 T3 


01-02-1997 






HU 


63441 A2 


30-08-1993 






OP 


5500211 T 


21-01-1993 






KR 


184235 Bl 


01-04-1999 






WO 


9102001 Al 


21-02-1991 






WO 


9213075 Al 


06-08-1992 






AU 


1441792 A 


27-08-1992 






EP 


0567599 Al 


03-11-1993 






OP 


6505631 T 


30-06-1994 



Form PCT/ISN210 (potert lamiry a"**) Uu*f 1992) 



